Oct 5  Thu  John Fry (Sheffield)  Statistics Seminar  
14:00  The Mathematics of Financial Crashes  


Oct 5  Thu  Keith Harris (Sheffield)  Statistics Seminar  
14:00  Statistical Modelling and Inference for RadioTracking.  


Nov 2  Thu  Nancy Nicholls (Reading)  Statistics Seminar  
14:00  Getting Started: Data Assimilation for Very Large Inverse Problems in Environmental Science  


Nov 9  Thu  Clive Anderson (Sheffield)  Statistics Seminar  
14:00  Some Extreme Value Problems in Metal Fatigue  


Nov 16  Thu  David Scott (Auckland)  Statistics Seminar  
14:00  The hyperbolic and related distributions: problems of implementation  


Nov 23  Thu  Stuart Barber (Leeds)  Statistics Seminar  
14:00  Signal processing using complex Daubechies wavelets  


Nov 30  Thu  Goran Peskir (Manchester)  Statistics Seminar  
14:00  Optimal stopping  


Dec 7  Thu  Raj Bhansali (Liverpool)  Statistics Seminar  
14:00  Frequency Analysis of Chaotic Intermittency Maps with Slowly Decaying Correlations  


Dec 14  Thu  Stefanie Biedermann (Southampton)  Statistics Seminar  
14:00  Robust optimal designs for doseresponse experiments  


Feb 8  Thu  Elke Thonnes (University of Warwick)  Statistics Seminar  
14:00  Statistical analysis of pore patterns in fingerprints  


Feb 22  Thu  Ed Cripps (Sheffield)  Statistics Seminar  
14:00  Variable selection and covariance selection in multivariate Gaussian linear regression  


Mar 22  Thu  SÃ¸ren Asmussen (Aarhus)  Statistics Seminar  
14:00  Tail Probabilities for a Computer Reliability Problem  


May 3  Thu  Chris Williams (Edinburgh)  Statistics Seminar  
14:00  Gaussian processes and machine learning  


May 10  Thu  Simon TavarÃ© (Southern California)  Statistics Seminar  
14:00  Stochastic processes in stem cell evolution  


May 31  Thu  Mark Davis (Imperial)  Statistics Seminar  
14:00  


Oct 11  Thu  Richard Jacques (University of Sheffield)  Statistics Seminar  
14:00  Classification Methods for the Analysis of High Content Screening Data  
Hicks Room K14  
Abstract: The current paradigm for the identification of candidate drugs within the pharmaceutical industry typically involves the use of high throughput screens. A high throughput screen allows a large number of compounds to be tested in a biological assay in order to identify any activity inhibiting or activating a biological process. From each of the assays run through a high throughput screen a high content screen image is produced which can be analysed using advanced imaging algorithms to produce a set of variables which reflect the observed activity of the cells within the image. Classification methods have important applications in the analysis of high content screening data where they are used to predict which compounds have the potential to be developed into new drugs. Statistical approaches have been developed that enable classification using a single parameter. However, approaches for multiparametric selection are still in their infancy. Furthermore, proper exploitation of the information contained within each high content screen image will enable more refined compound selection. A new classification technique for the analysis of data from high content screening experiments will be presented and the methodology illustrated on an example data set using a random forest classifier. 



Oct 11  Thu  Michailina Siakalli (University of Sheffield)  Statistics Seminar  
14:00  Stochastic Stabilization  
Hicks Room K14  
Abstract: In simple words stability of a dynamic system means sensitivity of the system to changes. Consider a first order nonlinear differential equation system dx(t)\dt=f(x(t)). Investigating what happens when noise is added, it has so far been observed that Brownian motion noise can stabilize an unstable system or destabilize it in the case that is stable. In my talk I will describe what is happening when the given nonlinear system is perturbed by different types of Poisson noise. 



Nov 8  Thu  Markus Riedle (University of Manchester)  Statistics Seminar  
14:00  Introduction to stochastic delay differential equations  
Hicks Room K14  
Abstract: In the last years stochastic functional differential equations or stochastic differential equations with delay have gained increasing attention in several scientific areas such as economy, biology, physics and medicine. The reason can be found in the observation that in a huge variety of models the evolution of the process describing the dynamics in the model under consideration not only depends on the current state of the process but also on its former states. This effect is due to various reasons such as time to maturity, incubation time, time to build, time to transport, hysteresis, delayed feedback and past dependent volatility. In the beginning of the talk we present some of these applications of stochastic functional differential equations. We introduce the basic ideas of ordinary stochastic differential equations not depending on the past and explain how these equations can be generalised to functional equations covering the examples presented before. The fundamental theory of stochastic functional differential equations are introduced and in particular compared with the situation of ordinary stochastic differential equations. In the remaining part of the talk we distinguish several cases how the random noise and past dependence enter the equation and we focus here on asymptotic aspects of the solution. We present some phenomena only known from delay equations. We also introduce some results which explain the relation of functional and partial stochastic differential equations. 



Nov 14  Wed  Alexander J McNeil (HeriotWatt University)  Statistics Seminar  
14:00  A New Perspective on Archimedean Copulas  
Hicks Room K14  
Abstract: The Archimedean copula family is used in a number of actuarial applications, ranging from the construction of multivariate loss distributions to frailty models for dependent lifetimes. We present some new results that contribute to a greater understanding of this family and point the way to improved simulation and estimation procedures. We derive necessary and sufficient conditions for an Archimedean generator function (a continuous, decreasing mapping of the positive halfline to the unit interval) to generate a copula in a given dimension d. We also show how the Archimedean family coincides with the class of survival copulas of L1norm symmetric distributions. These results allow us to construct a rich variety of new Archimedean copulas in different dimensions and to solve in principle the problem of generating samples from any Archimedean copula. The practical consequences include new models for negatively dependent risks, simple formulas for rank correlation coefficients and diagnostic tests for Archimedean dependence. 



Nov 22  Thu  Qiwei Yao (London School of Economics)  Statistics Seminar  
14:00  Modelling Multiple Time Series via Common Factors  
Hicks Room K14  
Abstract: We propose a new method for estimating common factors of multiple time series. One distinctive feature of the new approach is that it is applicable to nonstationary time series. The unobservable (nonstationary) factors are identified via expanding the orthoganal complement of the factor loading space step by step; therefore solving a highdimensional optimization problem by many lowdimensional subproblems. Asymptotic properties of the estimation were investigated. The proposed methodology was illustrated with both simulated and real data sets. 



Nov 29  Thu  Boris Mitavskiy (University of Sheffield)  Statistics Seminar  
14:00  Complexity of Evaluating the Probability Distribution of State Cycles in Finite State Update Networks  
Hicks Room K14  
Abstract: In many situations in biology (gene interactions, metabolic pathways, etc) and communications (mobile phones, WWW) an appropriate model is provided by a digraph in which the nodes (genes, metabolites, phones, computers) are in various states, and these states are updated (at times $t=0, \, 1, \, 2, \ldots$) as a response to the states of the ``incoming nodes". Assuming synchronous updating then the state of the system as a whole $U(t)$ say is some function of $U(t1)$. The dynamics of the system (i.e. the sequence of $U(t)$) can then be described by a directed graph over the possible states, where two states $\mathbf{x}$ and $\mathbf{y}$ are joined if $U(t1)=\mathbf{x}$ implies $U(t)=\mathbf{y}$. Since the system is finite this directed graph consists of a set of cycles, and a set of trees each rooted (the edges of each tree pointing towards the root) on the cycles. There is much known (but little understood) about these dynamics. In this talk I'll introduce a rigorous simplified model of this scenario and study its basic properties with respect to the distribution of cycle lengths. It turns out that the distribution of fixed points is rather straightforward to compute (and it is the uniform distribution regardless of the network topology!) while the distribution of cycles of length $k$ for any fixed $k \geq 2$ is already an NPhard question with respect to the size of the underlying digraph. I will provide a brief introduction to the theory of NPcompleteness which is sufficient to understand the proofs. If time allows, I will also discuss a constant time algorithm to solve the subproblem where the underlying digraph is an $r$input regular one. 



Feb 7  Thu  John Haslett (Dublin Trinity College)  Statistics Seminar  
14:00  Monotone smoothing: application of a compound Poisson Gamma process to modelling radiocarbondated depth chronologies  
Hicks Room K14  
Abstract: We propose a new and simple continuous Markov monotone stochastic process and use it for Bayesian monotone smoothing. The process is piecewise linear, based on additive independent Gamma increments arriving in a Poisson fashion. A special case allows very simple conditional simulation of sample paths given known values of the process. We take advantage of a reparameterisation involving the Tweedie distribution to provide efficient MCMC computation. The motivating problem is the establishment of a chronology for samples taken from lake sediment cores; that is, the attribution of a set of dates to samples of the core given their depths, knowing that the agedepth relationship is monotone. The chronological information arises from radiocarbon (14C) dating at a subset of depths. We use the process to model the stochastically varying sedimentation rate. 



Feb 14  Thu  Rita ZapataVasquez (University of Sheffield)  Statistics Seminar  
14:00  Bayesian costeffectiveness analysis based on a decision analytic model  
Hicks Room K14  
Abstract: The purpose of economic evaluations relating to costeffectiveness analysis is to provide decisionmakers with sufficient evidence to establish the relevance or pertinence of one treatment or strategy over another, or to adjust the results to his/her location of interest. Costeffectiveness studies based on decision models involve highlighting specific features of previously published studies. However, the lack of evidence or of consistent reports is common in many fields. In medicine this is complicated by the fact that it is ethically unacceptable to implement clinical trials that put patients under a high risk, or because the cost of such trial is not affordable. Apart from the specialized literature, another source of information is that which can be obtained from experts through the use of elicitation. Regardless of the origin, from this knowledge judgements are established to represent the uncertainty of the data through the use of probability distributions. A model for assessing the costeffectiveness of two management strategies for the treatment of intracranial hypertension in children with severe traumatic brain injury is outlined. Some parts of the model structure will be presented, but I will focus on the way that the uncertainty of the parameters (inputs) of the model were formulated as probability distributions, based on the corresponding judgements. Certain dependence relations among inputs will be shown, and how learning from one aspect may change our beliefs. Further, I will comment on how the dependence can be conceived when cost and effects come from different sources. 



Feb 14  Thu  Theresa Cain (University of Sheffield)  Statistics Seminar  
14:00  Bayesian Inference for health state utilities using pairwise comparison data  
Hicks Room K14  
Abstract: The National Institute for Health and Clinical Excellence (NICE) makes recommendations about which drugs should be available on the NHS. An important part of this decision is performing a costeffectiveness analysis. When evaluating the costeffectiveness of a treatment, it is important to consider the quality of life a patient experiences. The quality of life is described by utility, a measure of preference for a particular health condition. Conventional methods of eliciting utilities such as the Standard Gamble and Time Tradeoff involve questions that some respondents might find difficult to answer. An alternative method is to collect discrete choice data, in which respondents simply state which health state they prefer from two alternatives, rather than provide actual utilities. The underlying utilities must be determined given these pairwise choices. We consider Bayesian approaches for inference about population utilities given such pairwise choice data. 



Feb 28  Thu  Michael Papathomas (Imperial College London)  Statistics Seminar  
14:00  Obtaining proposal distributions for reversible jump MCMC  
Hicks Room K14  
Abstract: A major difficulty when implementing the reversible jump Markov chain Monte Carlo methodology lies in the choice of good proposals for the parameters of the competing statistical models. We focus on the comparison of nonnested loglinear models and present a novel approach for the construction of proposal distributions. 



Mar 13  Thu  Robert Gramacy (University of Cambridge)  Statistics Seminar  
14:00  Importance Tempering  
Hicks Room K14  
Abstract: Simulated tempering (ST) is an established Markov Chain Monte Carlo (MCMC) methodology for sampling from a multimodal density $\pi(\theta)$. The technique involves introducing an auxiliary variable k taking values in a finite subset of [0,1] and indexing a set of tempered distributions, say $\pi_k(\theta) = \pi(\theta)^k$. Small values of k encourage better mixing, but samples from $\pi$ are only obtained when the joint chain for $(\theta,k)$ reaches k=1. However, the entire chain can be used to estimate expectations under pi of functions of interest, provided that importance sampling (IS) weights are calculated. Unfortunately this method, which we call importance tempering (IT), has tended not work well in practice. This is partly because the most immediately obvious implementation is naïve and can lead to high variance estimators. We derive a new optimal method for combining multiple IS estimators and prove that this optimal combination has a highly desirable property related to the notion of effective sample size. The methodology is applied in two modelling scenarios requiring reversiblejump MCMC, where the naïve approach to IT fails: model averaging in treed models, and model selection for markrecapture data. 



Apr 10  Thu  Oliver Johnson (University of Bristol)  Statistics Seminar  
14:00  Maximum entropy and Poisson approximation  
Hicks Room K14  
Abstract: I will show that the Poisson distribution maximises entropy in the class of ultra logconcave distributions (a class which includes sums of Bernoulli variables). I will also explain how this result relates to bounds in Poisson and compound Poisson approximation. 



Apr 17  Thu  Adam Butler (BioSS Edinburgh)  Statistics Seminar  
14:00  A latent Gaussian model for compositional data with many zeros  
Hicks Room K14  
Abstract: Compositional data record the relative proportions of different components within a mixture, and arise frequently in many fields, including geology, ecology and human health. Standard statistical techniques for the analysis of such data assume the absence of proportions which are genuinely zero, but real data may contain a substantial number of zero values. In this talk I will present a latent Gaussian model for the analysis of compositional data which contain zero values, based on assuming that the data arise from a (deterministic) Euclidean projection of a multivariate Gaussian random variable onto the unit simplex. A simulation study is used to compare three difference methods of inference  maximum likelihood estimation, MCMC and approximate Bayesian computation  and the methodology is illustrated using real data on dietary intake. 



Apr 24  Thu  Leszek Roszkowski (University of Sheffield)  Statistics Seminar  
14:00  Bayesian Statistics in Cosmology and Particle Physics  
Hicks Room K14  
Abstract: I will describe two recent applications of Bayesian statistics. In one, main features of our Universe are extracted from studies of cosmic background radiation. In the other, current data is used to speculate about properties of ``new physics'' models based on supersymmetry that will soon be tested in particle physics experiments at the Large Hadron Collider (LHC) at CERN near Geneva. 



May 8  Thu  Owen Jones (University of Melbourne)  Statistics Seminar  
14:00  Looking for continuous local martingales  
Hicks Room K14  
Abstract: Continuous local martingales, or equivalently timechanged Brownian motion, are a popular class of models in finance. We present a set of statistical tests for whether or not an observed process is a continuous timechanged Brownian motion, based on the concept of the crossing tree. We apply our methodology to five currency exchange ratesAUDUSD, JPYUSD, EURUSD, GBPUSD and EURGBPand show that in each case, when viewed at a moderately large time scale, the logtransformed series is consistent with a continuous local martingale model. 



May 22  Thu  Neil O'Connell (University of Warwick)  Statistics Seminar  
14:00  Exponential functionals of Brownian motion and class one Whittaker functions  
Hicks Room K14  
Abstract: Motivated by a problem concerning scaling limits for directed polymers, and recent extensions of Pitman's `2MX' theorem including an analogue, due to Matsumoto and Yor, for exponential functionals of Brownian motion, we consider (multidimensional) Brownian motion conditioned on the asymptotic law of a family of exponential functionals and identify which laws give rise to diffusion processes. For particular families (with a lot of symmetry) these conditioned processes are related to class one Whittaker functions associated with semisimple Lie groups. The work of Matsumoto and Yor corresponds to the group GL(2,R) and the class one Whittaker function in this case is essentially the Macdonald function (or modified Bessel function of the second kind). For the group GL(3,R) many explicit formulae are available for understanding the behaviour of these processes. The directed polymer problem should correspond to the group GL(n,R) and the asymptotics of the corresponding Whittaker functions for large n, but there are significant technical hurdles to overcome before this can be made fully rigourous. This is based on joint work with Fabrice Baudoin. 



Jun 5  Thu  David Lucy (University of Lancaster)  Statistics Seminar  
14:00  
Hicks Room K14  


Oct 9  Thu  Richard Wilkinson (Sheffield)  Statistics Seminar  
14:00  Estimating Species Divergence Times Using the Fossil Record  
Hicks Room K14  
Abstract: In this talk I will show how to estimate species divergence times using the fossil record. I will describe how branching process models can be conditioned to contain subtrees originating at a given point in time, and how these can be used to model evolution taking some known phylogenetic structure into account. Inference can be performed using Approximate Bayesian Computation (ABC) and I will describe a hybrid ABCGibbs algorithm that can improve the efficiency of the basic ABC algorithm. 



Oct 16  Thu  Leo Bastos (University of Sheffield)  Statistics Seminar  
14:00  Diagnostics for Gaussian Process Emulators  
Hicks Room K14  
Abstract: This work presents some diagnostics to validate and assess the adequacy of a Gaussian process emulator as surrogate for a computer model. These diagnostics are based on comparisons between simulator outputs and Gaussian process emulator outputs for some test data, known as validation data, defined by a sample of simulator runs not used to build the emulator. Our diagnostics take care to account for correlation between the validation data. In order to illustrate a validation procedure, these diagnostics are applied to two different data sets. 



Oct 16  Thu  Tom Fricker (University of Sheffield)  Statistics Seminar  
14:00  Prior specification in Gaussian process emulators: What do we mean by the mean?  
Hicks Room K14  
Abstract: When building an emulator for a computer model, we treat the model output as an unknown deterministic function of the inputs. The data we have are observations of the computer model output at a number of input points, and our task is to make inference about the function using this noiseless data. We use a semiparametric regression model, a priori describing the function as the sum of a parametric mean function and a zeromean Gaussian process. Often in past a very basic regression function has been used for the mean (either constant or linear in the inputs), and most of the effort has been spent in correctly specifying the Gaussian process to model the residuals. However, in some quarters it is believed that we should attempt to build more prior information about the computer model into the emulator via the mean function. But individual realisations of a zeromean Gaussian process do not necessarily have a mean value of zero, so what exactly is meant when we talk about `the prior mean' of the model? How far should we go in the mean function's complexity? What happens if we overfit it? And does this extra effort actually improve the emulator's predictions of the computer model? In this talk I shall use some very simple toy examples to explore these questions (but without necessarily offering any answers...) 



Nov 6  Thu  Mark Steel (University of Warwick)  Statistics Seminar  
14:00  TimeDependent StickBreaking Processes  
Hicks Room K14  
Abstract: This paper considers the problem of defining a timedependent nonparametric prior. A recursive construction allows the definition of priors whose marginals have a stickbreaking form. The processes with PoissonDirichlet and Dirichlet process marginals have interesting interpretations that are further investigated. We develop a general conditional MCMC method for inference in a wide subclass of these models. We derive a Polya urn scheme type representation of the Dirichlet process construction. This allows us to develop a marginal MCMC method for this case. The result section shows the relative performance of the two MCMC schemes for the Dirichlet process case and looks at two data examples. 



Nov 13  Thu  Dan Crisan (Imperial College)  Statistics Seminar  
14:00  Sequential Monte Carlo methods  a theoretical perspective  
Hicks Room K14  
Abstract: The aim of the talk is to present a bird'seye view of sequential Monte carlo methods (including the SIR algorithm and branching algorithms) with emphasis on classical convergence results. Additionally, some recent uniformly convergent particle filters will be discussed. The second part of the talk is based on joint work with K. Heine (see http://www.ma.ic.ac.uk/~dcrisan/crihei2.pdf for details) 



Nov 20  Thu  Martin Hairer (University of Warwick)  Statistics Seminar  
14:00  A weak form of Harris's theorem  
Hicks Room K14  
Abstract: Harris' theorem gives easily verifiable conditions for a Markov operator to have a spectral gap in a weighted supremum norm. We are going to show a new elementary proof of this result. This proof can then be generalised to situations where Harris' theorem fails in order to prove a 'weak' form of it. The range of possible applications includes a number of stochastic PDEs and stochastic delay equations. 



Nov 27  Thu  Jon Pitchford (University of York)  Statistics Seminar  
14:00  Is there something fishy about Lévy processes?  
Hicks Room K14  
Abstract: Lévy flights are loosely defined as random walks in which the step lengths are drawn from some underlying power law distribution. In biology, detecting Lévylike behaviour is worryingly fashionable and interestingly controversial. Do Lévy flights really occur? If so, then why have they evolved? I will discuss possible answers to these questions, arguing that there may be a role for more general Lévy processes in biology and ecology. I will draw on two examples from my recent research: superspreading in epidemics, and stochastic foraging in patchy environments. 



Dec 4  Thu  Mike Campbell (University of Sheffield)  Statistics Seminar  
14:00  A statistician on a NICE appraisals committee  
Hicks Room K14  
Abstract: NICE stands for the National Institute for Health and Clinical Excellence. The speaker has been on a NICE Appraisals committee for 7 years. He will describe what the committee does and how NICE makes decisions. Much of the evidence to NICE is statistical and a statistician is an important member of the committee. A number of roles for a statistician will be described. One role is checking for errors and he will describe some he has come across. 



Dec 18  Thu  George Streftaris (HeriotWatt University)  Statistics Seminar  
14:00  Bayesian inference for stochastic epidemic models with nonexponential tolerance to infection  
Hicks Room K14  
Abstract: The transmission dynamics of an infectious disease during the outbreak of an epidemic can be stochastically described through a timeinhomogeneous Poisson process, thus assuming exponentially distributed levels of disease tolerance, following the socalled Sellke (1983) construction. In this talk I will present generalisations of the Sellke structure under the susceptibleexposedinfectiousremoved (SEIR) class of epidemic models, and focus on a model with Weibull individual tolerance thresholds. Examples of simulated and real epidemic data are discussed, where inference is carried out using MCMC methods following a Bayesian approach to tackle the issue of the partial observation of the temporal course of the epidemic. The adequacy of the models is assessed using methodology based on the properties of Bayesian latent residuals, demonstrating problems with more commonly used model checking techniques. 



Feb 12  Thu  Lindsay Collins (Sheffield)  Statistics Seminar  
14:00  Climate variability and its effect on atmosphere/terrestrialbiosphere carbon fluxes  
Hicks LT7  
Abstract: In my PhD I will study the effect of climate uncertainty and variability on vegetation carbon dynamics. Our interest in the terrestrial biosphere lies in the carbon that is released into the atmosphere or stored in the soil through the land vegetation. The Sheffield Dynamic Global Vegetation Model (SDGVM) simulates the terrestrial vegetation processes (including photosynthesis and respiration) and provides estimates of terrestrial carbon fluxes. The SDGVM is driven by monthly climate data. The monthly data are downscaled to daily data within the SDGVM using a weather generator so that the vegetation processes can be calculated daily. I will show how temporal variability leads to differing carbon flux estimates. We aim to quantify the uncertainty in the carbon flux estimates directly linked to uncertainty and variability in the climate data using probabilistic sensitivity analysis (PSA) methods developed by Oakley and O'Hagan (2004) making use of the GEMSA software developed by Kennedy (2004) for working with complex models such as the SDGVM. I will show how the form of the climate data makes the use of this software less than straightforward and introduce methodology by which a PSA may be possible. This will involve the characterisation of the uncertainty in the climate in terms of parameters that can be used as input to GEMSA rather than actual data. 



Feb 12  Thu  Lu Zou (Sheffield)  Statistics Seminar  
14:00  Multiple Imputations of BioDatasets  
Hicks LT7  
Abstract: This presentation will start with a brief introduction to two Biodatasets involved in my study. One inevitable issue is that many values are missing in both sets. Rather than ignoring them, imputation is considered. This talk will focus on the imputation of continuous variables which are to be used as Biomarkers in two situations: i) normal randomly missing situation and ii) a 'Filematching' situation. Several imputation methods are considered: for single imputation, the KNearest Neighbours method (KNN) and the EMalgorithm are studied; for multiple imputations, the Multiple Imputation using Additive Regression, Bootstrapping and Predictive Mean Matching (PMM) and the EM imputation combined with resampling methods are investigated. Based on the studies so far, the EM algorithm is relatively more suitable in my case. 



Feb 19  Thu  Andrew Stuart (University of Warwick)  Statistics Seminar  
14:00  MetropolisHastings Methods for Sampling Random Functions  
Hicks LT7  
Abstract: Many applied problems require the practitioner to obtain information from a probability measure on functions. Examples include signal processing, weather prediction, oceanography, nuclear waste management and oil recovery. I will show that, despite the wide variety of physical phenomena underlying these examples, there is a common mathematical structure which can be exploited in a number of ways. I will highlight how this structure can be used to design efficient MCMC methods to sample from the desired probability measure, generalizing random walk and other MetropolisHastings methods to the function space setting. 



Feb 26  Thu  Mike Titterington (Glasgow)  Statistics Seminar  
14:00  Approximate inference for latent variable models  
Hicks LT7  
Abstract: Likelihood and Bayesian inference are not straightforward for latent variable models, of which mixture models constitute a special case.. For instance, in the context of the latter approach, conjugate priors are not available. The talk will consider some approximate methods that have been developed mainly in the machinelearning literature and will attempt to investigate their statistical credentials. In particular, socalled variational methods and the ExpectationPropagation method will be discussed. It will be explained that, in the Bayesian context, variational methods tend produce approximate posterior distributions that are located in the right place but are too concentrated, whereas the ExpectationPropagation approach sometimes, but not always, gets the degree of concentration, as measured by posterior variance, right as well. 



Mar 5  Thu  David Leslie (Bristol)  Statistics Seminar  
14:00  Posterior weighted reinforcement learning with state uncertainty  
Hicks LT7  
Abstract: Reinforcement learning models are, in essence, online algorithms to estimate the expected reward in each of a set of states by allocating observed rewards to states and calculating averages. Generally it is assumed that a learner can unambiguously identify the state of nature. However in any natural environment the state information is noisy, so that the learner cannot be certain about the current state of nature. Under state uncertainty it is no longer immediately obvious how to perform reinforcement learning, since the observed reward cannot be unambiguously allocated to a particular state of the environment. A new technique, posterior weighted reinforcement learning, is introduced. In this process the reinforcement learning updates are weighted according to the posterior state probabilities, calculated after observation of the reward. We show that this modified algorithm can converge to correct reward estimates, and show the procedure to be a variant of an online expectationmaximisation algorithm, allowing further analysis to be carried out. 



Mar 12  Thu  Gareth Roberts (Warwick)  Statistics Seminar  
14:00  Retrospective sampling  
Hicks LT7  
Abstract: This talk will discuss a very simple idea for simulation called retrospective sampling. The method can be applied in the context of many wellused simulation methods such as rejection sampling and MCMC. A number of very simple examples will be described to illustrate the ideas. As time permits, I will give some applications, possibly including exact simulation of diffusion paths and posterior distributions for Dirichlet mixture models. 



Mar 26  Thu  Simon Wilson (Trinity College Dublin)  Statistics Seminar  
14:00  Factor Analysis with a Mixture of Gaussian Factors, with Application to Separation of the Cosmic Microwave Background  
Hicks LT7  
Abstract: Blind source separation is a technique in signal processing where the values of 'sources' are inferred from observations that are linear combinations of them. The typical example is separating two voices (the sources) from a stereo audio recording (each microphone picks up a combination of the two speakers' voices). Both the sources and the matrix of linear 'mixing' coefficients may be unknown. In statistical terms, it is an example of factor analysis, the main difference being that the 'factors' here will have some interpretation and there may exist useful prior information on them. $~~$ Here we describe an approach to factor analysis/source separation where the sources are assumed to be Gaussian mixtures, which may be independent or dependent e.g. mixtures of multivariate Gaussians. An MCMC procedure has been developed that implements a fully Bayesian procedure e.g. it computes the posterior distribution of sources, their Gaussian mixture parameters and the matrix of linear coefficients from the data. $~~$ The method is applied to recovery of the cosmic microwave background (CMB), being an example of source separation applied to image data. The CMB is one of many sources of extraterrestrial microwave radiation and we observe a weighted sum of these sources from the Earth at different frequencies. Its accurate reconstruction is of great interest to astronomers and physicists since knowledge of its properties, and in particular its anisotropies, will place strong restrictions on current cosmological theories. From the perspective of a Bayesian solution, this application is interesting as there is considerable prior information about the linear coefficients and the sources. Results from the analysis of data from the WMAP satellite will be presented, where microwave radiation is observed at 5 frequencies and separated into sources, including the CMB. A discussion of the many outstanding issues in this problem is also presented. 



Mar 26  Thu  Peter Goos (Antwerp)  Statistics Seminar  
16:00  The optimal design of conjoint choice experiments  
Hicks LT5  
Abstract: Stated preference data are commonly collected by means of conjoint choice experiments or discrete choice experiments in marketing, health economics or environmental economics. The optimal design of these experiments is a challenging research area because of the nonlinearity of the statistical models used to analyze the data. These models include the conditional logit model, the mixed logit model and the nested logit model. In this talk, I will discuss recent advances in the optimal design for such models as well as some of the challenging computational aspects of the optimal design search. 



Apr 2  Thu  Philip Jonathan (Shell Technology Centre Thornton)  Statistics Seminar  
14:00  Modelling spatial and directional effects in extreme value analysis  
Hicks LT7  
Abstract: The characteristics of extreme waves in stormdominated regions vary systematically with a number of covariates, including location and storm direction. Reliable estimation of the magnitude of extreme events associated with a given return period requires incorporation of covariate effects within extreme value models. A spatiodirectional extremes model will be outlined, based on a nonhomogeneous Poisson model of peaks over threshold. At each location, a nonparametric estimate for extreme threshold as a function of storm direction is made. The rate of occurrences of threshold exceedences is modelled as a Poisson process. The size of threshold exceedences is modelled using a generalised Pareto form, the parameters of which vary smoothly in space, and are estimated using a roughness penalised likelihood approach using thin plate splines. The approach will be motivated and illustrated in application to estimation of structural design criteria for the Gulf of Mexico. 



Apr 23  Thu  Goran Peskir (Manchester)  Statistics Seminar  
14:00  The British PutCall Symmetry  
Hicks LT7  
Abstract: I will review recent results/problems arising in the British pricing mechanism. This involves optimal stopping with nonmonotone free boundaries. 



Apr 23  Thu  Gennady Samorodnitsky (Cornell)  Statistics Seminar  
15:30  The 2009 Applied Probability Trust Lecture\\ Large deviations for point processes based on stationary sequences with heavy tails  
Hicks LT7  
Abstract: In many applications involving functional large deviations for partial sums of stationary, but not iid, processes with heavy tails, a curious phenomenon arises: closely grouped together large jumps coalesce together in the limit, leading to loss of information of the order in which these jumps arrive. In particular, many functionals of interest become discontinuous. To overcome this problem we move from the functional large deviations to the pointprocesslevel large deviations. We develop the appropriate topological framework and prove large deviations theorems for point processes based on stationary sequences with heavy tails. We show that these results are useful in many situations where functional large deviations are not. 



Apr 30  Thu  Svetlana Tishkovskaya (Sheffield)  Statistics Seminar  
14:00  Optimal Quantisation in Bayesian Estimation  
Hicks LT7  
Abstract: I consider Bayesian estimation of a parameter of a continuous distribution when observation space is quantised. Quantisation, as method of approximating a continuous range of values by a discrete set, arises in many practical situations which include modern methods of digital information processing, data compression, and some procedures of collecting data. It is well known that quantising of observations reduces values of convex information functionals. This information loss can be diminished by selecting the optimal partition. I consider two criteria of optimal quantisation in Bayesian estimation: the criterion of Bayes risk minimum and the criterion of minimum of information loss measured using Shannon information. As alternative to optimal partitioning, which realisation is often computationally demanding, an asymptotically optimal quantisation is also considered. 



May 7  Thu  Kevin Walters (Sheffield)  Statistics Seminar  
14:00  Are colonic stem cell data consistent with the immortal model of stem cell division under nonrandom strand segregation?  
Hicks LT7  
Abstract: Stem cells have the potential to revolutionize modern medicine with their regenerative potential however little is known about tissue stem cell differentiation invivo. Technical advances in laboratory methods have started to provide data that allow us to make simple inferences about tissue stem cell behaviour. This talk will focus on a particular model of stem cell differentiation. 



May 14  Thu  Erika Hausenblas (Saltzburg)  Statistics Seminar  
14:00  Stochastic Partial Differential Equations driven by Poisson Random Measure  
Hicks LT7  
Abstract: I will start with pointing out some examples coming from physics to motivate stochastic partial differential equations (SPDEs). Then I will briefly explain the differences in the dynamics between deterministic partial differential equations and SPDEs. After this motivation I will speak about stochastic integration in Banach spaces and point out the differences with the stochastic integral with respect to the Wiener process. Finally, I give some results concerning SPDEs drive by Poisson random measures. 



Jun 4  Thu  Katy Klauenberg (Sheffield)  Statistics Seminar  
14:00  Statistical Modelling for Dating Ice Cores  
Hicks LT7  
Abstract: In ice cores which are drilled through ice sheets in polar regions valuable information about past environment and climate are preserved. A pivotal part of interpreting the information held within the cores is to build ice core chronologies i.e. to relate time to depth. Existing dating methods can be categorised as follows: (1) layer counting using the seasonality in signals, (2) glaciological modelling describing processes such as snow accumulation and plastic deformation of ice, (3) comparison with other dated records, or (4) any combination of these. Conventionally, implementation of these approaches does not use statistical methods. We combine glaciological models with a Bayesian framework. For this purpose, the sources of uncertainty in the glaciological model and the knowledge about these are formalised. Additionally, we include information from layer counting and other dated records (i.e. traces from volcanic eruptions) to constrain the resulting dating. During the talk the setup of this statistical model will be described, the effect of uncertainty in the glaciological model will be demonstrated and the interplay with information from other dating methods will be illustrated. This combined statistical dating approach is applied to date Antarctic ice cores. For the first time the effects of uncertainty implied by the dating method are investigated for ice core chronologies, which provides valueable insights for the applied community. 



Oct 1  Thu  Jeremy Oakley (Sheffield)  Statistics Seminar  
14:00  Eliciting Probability Distributions  
Hicks LT6  
Abstract: Elicitation is the process of extracting expert knowledge about some unknown quantity of interest and representing that knowledge with a suitable probability distribution. It is an important component of Bayesian inference, risk analysis, and decisionmaking in the presence of uncertainty. In this talk I will give an introduction to the field and discuss some current research interests, including nonparametric elicitation, the trial roulette method, and SHELF: the Sheffield Elicitation Framework. 



Oct 8  Thu  Nathan Green (Dstl Porton Down)  Statistics Seminar  
14:00  Determining the Source of a Hazardous Atmospheric Release  
Hicks LT6  
Abstract: A methodology is explored for making inference about parameters of a hazardous atmospheric release from sensor readings. The key difficulty in performing this inference is that the results must be obtained in a very short timescale (5 min) to make use of the inference for protection. The methodology that has been developed uses some of the components in a sequential Monte Carlo algorithm. However, this inference problem is different from many other sequential Monte Carlo problems, in that there are no state evolution equations, the forward model is highly nonlinear and the likelihoods are nonGaussian. Results for inferences made of atmospheric releases (both real and simulated) of material will be presented, demonstrating that the sampling scheme performs adequately despite constraints of a short time span for calculations. Potential future developments and issues will also be discussed to show areas of future research interest. 



Oct 22  Thu  Tim Heaton (Sheffield)  Statistics Seminar  
14:00  Reconstructing a Wiener process from observations at imprecise times: Bayesian radiocarbon calibration  
Hicks LT6  
Abstract: For accurate radiocarbon dating, it is necessary to identify fluctuations in the level of radioactive carbon 14C present in the atmosphere through time. The processes underlying these variations are not understood and so a databased calibration curve is required. In this talk we present a novel MCMC approach to the production of the inter nationally agreed curve and the individual challenges involved. Our methodology models the calibration data as noisy observations of a Wiener process and updates sample paths through use of a MetropoliswithinGibbs algorithm. Implementation of this algorithm is complicated by certain specific features of the data used, namely that many data points: • relate to the mean of the Wiener process over a period of time rather than at a specific point, • have calendar dates found using methods (e.g. UraniumThorium) which are themselves uncertain, • have ordering constraints and correlations in their calendar date uncertainty  for example data are sampled along the same core or have floating calendar dates matched to another sample for which the calendar age is more accurately known. We give an overview of these issues and discuss their implications for the resulting sampler. 



Oct 29  Thu  Jianxin Pan (Manchester)  Statistics Seminar  
14:00  Modelling of MeanCovariance Structures for Longitudinal Data  
Hicks LT6  
Abstract: It is well known that when analysing longitudinal data, misspecification of covariance structures may lead to very inefficient or even biased estimators of parameters in the mean structure. Covariance structures, like the mean, can be modelled using linear or nonlinear regression models techniques. Various estimation methods have been recently developed for modelling of mean and covariance structures, simultaneously. In this talk, I will introduce such methods on modelling of meancovariance structures for longitudinal data, including linear and nonlinear regression models, variable selection, semiparametric models, etc. Real examples and simulation studies will be presented for illustration. 



Nov 5  Thu  Stanislav Volkov (Bristol)  Statistics Seminar  
14:00  The simple harmonic urn  
Hicks LT6  
Abstract: The simple harmonic urn is a discretetime stochastic process on Z2 approximating the phase portrait of the harmonic oscillator using very basic transitional probabilities on the lattice, incidentally related to the Eulerian numbers. This urn which we consider can be viewed as a twocolour generalized Polya urn with negativepositive reinforcements, and in a sense it can be viewed as a "marriage" between the Friedman urn and the OK Corral model, where we restart the process each time it hits the horizontal axes by switching the colours of the balls. We show the transience of the process using various couplings with birth and death processes and renewal processes. It turns out that the simple harmonic urn is just barely transient, as a minor modification of the model makes it recurrent. We also show links between this model and oriented percolation, as well as some other interesting processes. This is joint work with Edward Crane, Nicholas Georgiou, Rob Waters and Andrew Wade. 



Nov 12  Thu  Vassili Kolokoltsov (Warwick)  Statistics Seminar  
14:00  SDEs driven by nonlinear Levy noise with application to the construction of Markov processes with a given generator  
Hicks LT6  


Nov 26  Thu  David Sexton (The Met Office)  Statistics Seminar  
14:00  Making probabilistic climate projections for the UK  
Hicks LT6  
Abstract: UKCP09, the latest set of climate projections for the UK were released on June 18th 2009. For the first time the climate projections for the UK are probabilistic, so that it is an appropriate tool for people who are taking a riskbased approach to policy and decision making. I will describe how the probabilities were estimated using a) a combination of a number of climate model ensembles which explore parameter uncertainty in different components of the Earth System b) a set of international climate models other than the Met Office Hadley Centre model and c) a Bayesian framework which combines this climate model output with observations to provide probabilities that are relevant to the real world and therefore relevant to riskbased decision making. I will also outline the main areas of the production system that could benefit from further research into statistical methods and better experimental design. 



Dec 3  Thu  David Percy (Salford)  Statistics Seminar  
14:00  Predictive elicitation of subjective prior distributions  
Hicks LT6  
Abstract: This seminar tackles the problem of specifying subjective prior distributions for unknown model parameters. We first review strategies for selecting families of priors for common models, including univariate and multivariate probability distributions, generalized linear models and stochastic processes. We then consider methods for evaluating the hyperparameters of these prior distributions. Specifically, we focus on predictive elicitation using quantiles and cumulative probabilities, illustrating the natural beauty and philosophical benefits of this approach. We discuss problems relating to inherent constraints and computational difficulties, and conclude that some compromise is necessary. We illustrate the technique in applications from sport, medicine and industry. 



Dec 10  Thu  Lesley Morrell (Leeds)  Statistics Seminar  
14:00  Modelling the Selfish Herd: Behavioural mechanisms for aggregation in animals  
Hicks LT6  
Abstract: The theory of the selfish herd (WD Hamilton, 1971) has been highly influential to our understanding of animal aggregation. Hamilton proposed that in order to reduce its risk of predation, an individual should approach its nearest neighbour, reducing its risk at the expense of those around it. Despite extensive empirical support, the selfish herd hypothesis has been criticized on theoretical grounds: approaching the nearest neighbour does not result in the observed dense aggregations, and the nearest neighbour in space is not necessarily the one that can be reached fastest. To combat these problems, increasingly complex movement rules have been proposed, successfully producing dense aggregations of individuals, yet various questions remain unanswered. Is one movement rule always the most successful? How to ecological parameters such as the size and density of the group affect rule success? Is the behaviour of the predator important? Should all individuals within a group use the same rule, or should they adjust their behaviour based on where in the group they are, or in response to the behaviour of others? We use simulation models of animal groups to investigate these questions, and demonstrate that there is no rule that performs best under all circumstances: the ecology of the predator and prey are both key in determining how animals should respond to a predation attempt. 



Dec 17  Thu  Ben Youngman (Sheffield)  Statistics Seminar  
14:00  Modelling phenomena using different data sources  
Hicks LT6  
Abstract: The building of structures requires that their strength be sufficient to withstand daytoday wear and tear but also, ideally, all levels of extreme punishment. Yet in practice economical grounds require that some tradeoff between strength and susceptibility to damage be made to avoid costs spiralling. As it is logical to expect that the largest events will be most damaging, there is therefore motivation to estimate the distribution of extremes by, for example, estimating the probability of exceeding a certain high level. This is a typical problem in extremal analyses. More recently this problem has been extended by seeking estimates of extremal distributions over space, which is the topic of this talk, though here matters will be further complicated by spatiotemporally sparse data. To try to combat this, data obtained via different methods, yet in theory quantifying the same phenomenon, will be modelled simultaneously. Extreme value theory will be drawn upon to tackle this problem. This talk begins with an introduction to the topic and progresses by applying some ideas discussed. 



Dec 17  Thu  Afzalina Azmee (Sheffield)  Statistics Seminar  
14:00  Twostage testing in threearm noninferiority trials  
Hicks LT6  
Abstract: The aim of a noninferiority trial is to show that the new experimental treatment is not worse than the reference treatment by more than a certain, predefined margin. We consider the design of a 3arm noninferiority trial, where the inclusion of a placebo group is permissible. The widely used 3arm noninferiority procedure was authoritatively first described by Pigeot et al. (2003), which involved establishing superiority of reference against placebo in the first stage before testing noninferiority of experimental against reference in the second stage. If this preliminary test fails, the secondstage test has to be abandoned. In such an eventuality, we believe the whole study will be wasted as nothing new could be learnt about the new experimental treatment. Therefore, instead of showing superiority in the first stage, we propose that the reference treatment has to be significantly different than placebo as a prerequisite before using Fieller's confidence interval to assess noninferiority. This procedure leads to no peculiar intervals (i.e. exclusive or imaginary) and offers easy interpretation regarding the efficacy of experimental and reference treatments. 



Feb 11  Thu  Jonathan Jordan (Sheffield)  Statistics Seminar  
14:00  Geometric preferential attachment graphs  
Hicks K14  
Abstract: Preferential attachment (or "scalefree") random graphs, in which a growing network develops by new vertices attaching preferentially to existing vertices which already have a high degree, were proposed, originally by BarabÃ¡si and Albert, as models for networks appearing in a wide range of contexts (including biological, technological and social) in which examination of data often reveals an approximately power law distribution of vertex degrees. It was rigorously shown by BollobÃ¡s at al that preferential attachment graphs did indeed have this property. In many of the contexts in which random graph models are used it makes sense for the vertices to have some location in space. The original preferential attachment model has no spatial element, and in this talk I will describe a model which combines a preferential attachment element with a spatial element. I will describe results which show that under certain conditions on the spatial element the power law degree property is retained. I intend that most of the talk should be accessible to an applied audience, though there will be a few slides discussing my proof method. 



Feb 18  Thu  Vincent Macaulay (Glasgow)  Statistics Seminar  
14:00  Inference about past human migration episodes from modern DNA data  
Hicks K14  
Abstract: One view of human prehistory is of a set of punctuated migration events across space and time, associated with settlement, resettlement and discrete phases of immigration. It is pertinent to ask whether the variability that exists in the DNA sequences of samples of people living now, something which can be relatively easily measured, can be used to fit and test such models. Population genetics theory already makes predictions of patterns of genetic variation under certain very simple models of prehistoric demography. In this presentation I will describe an alternative, but still quite simple, model designed to capture more aspects of human prehistory of interest to the archaeologist, show how it can be rephrased as a mixture model, and illustrate the kinds of inferences that can be made on a real data set, taking a Bayesian approach. 



Feb 25  Thu  Mark Broom (Sussex)  Statistics Seminar  
14:00  Models of evolution on structured populations with asymmetry  
Hicks I19  
Abstract: We investigate two examples of models of populations with structure, involving asymmetry. These are different in character, with the common theme that both the structure and the asymmetry have an important influence on population outcomes. The first part of the talk concerns the study of evolutionary dynamics on populations with some nonhomogeneous structure, a topic in which there is a rapidly growing interest. We investigate the case of nondirected equally weighted graphs and find solutions for the fixation probability of a single mutant in two classes of simple graphs. This process is a Markov chain and we prove several mathematical results. For example we prove that for all but a restricted set of graphs, (almost) all states are accessible from the possible initial states. To find the fixation probability of a line graph we relate this to a twodimensional random walk which is not spatially homogeneous. We investigate our solutions numerically and find that for mutants with fitness greater than the resident, the existence of an asymmetric population structure helps the spread of the mutants. Thus it may be that models assuming wellmixed populations consistently underestimate the rate of evolutionary change. In the second part we consider a model of kleptoparasitism, the stealing of food from one animal by another. The handling process of food items can take some time and the value of such items can vary depending upon how much handling an item has received. Furthermore this information may be known to the handler but not the potential challenger, so there is an asymmetry between the information possessed by the two competitors. We use gametheoretic methods to investigate the consequences of this asymmetry for continuously consumed food items, depending upon various natural parameters. A variety of solutions are found, and there are complex situations where three possible solutions can occur for the same set of parameters. It is also possible to have situations which involve members of the population exhibiting different behaviours from each other. We find that the asymmetry of information often appears to favour the challenger, despite the fact that it possesses less information than the challenged individual. 



Mar 4  Thu  Jonty Rougier (Bristol)  Statistics Seminar  
14:00  Uncertainty and Risk in Natural Hazards  
Hicks K14  
Abstract: In natural hazards (volcanoes, earthquakes, floods etc) it is useful for modelling purposes to make a distinction between aleatory and epistemic uncertainty, where the former represents the inherent or natural uncertainty of the hazard, and the latter represents everything else. Natural hazards scientists are often reluctant to quantify epistemic uncertainty with probability, due in a large part to its subjective nature. But this challenge should be weighed against the additional problems that nonquantified uncertainty create for the risk manager and the policymaker. This talk explores these issues in the light of the recent NERC scoping study on natural hazards uncertainty and risk. 



Mar 11  Thu  John Aston (Warwick)  Statistics Seminar  
14:00  Using Functional Principal Component Analysis and Mixed Effect Models to Analyse Spoken Language  
Hicks I19  
Abstract: Fundamental frequency (F0, broadly ``pitch'') is an integral part of spoken human language; however, a comprehensive quantitative model for F0 can be a challenge to formulate due to the large number of effects and interactions between effects that lie behind the human voice's production of F0, and the very nature of the data being a contour rather than a point. A semiparametric functional response model for F0 will be formulated by incorporating linear mixed effects models through the functional principal component scores. This model is applied to the problem of modelling F0 in the tone languages such as Mandarin and Qiang (a dialect from China), languages in which relative pitch information is part of each word's dictionary entry. 



Mar 18  Thu  Norman Fenton (Queen Mary)  Statistics Seminar  
14:00  Uncertainty, Risk and Decision Making  
Hicks K14  
Abstract: Current approaches to uncertain reasoning and risk assessment are often fundamentally flawed. Motivated by real examples from the law and medicine (including a murder trial and a medical negligence trial in which I was an expert witness), I will explain how such flawed reasoning can be avoided by adopting a Bayesian approach. I will introduce the notion of subjective probability and Bayes theorem and argue that this is the only rational approach for handling uncertainty. The problem with this approach is how to scale it up to complex risk assessment problems involving many causally related factors. I will introduce the notion of Bayesian nets and show how they address this problem. I will demonstrate how we have used Bayesian nets in a range of real applications including in legal arguments, medical risk assessment, and software risk assessment. 



Apr 15  Thu  John McColl (Glasgow)  Statistics Seminar  
14:00  Assessment and Feedback in Statistics Courses  
Hicks K14  
Abstract: Giving useful feedback to students about their work ought to be an integral part of the teaching, learning and assessment process, so that learners know where they went wrong and what they can do to improve in the future. In the National Student Survey, student ratings of assessment and feedback are generally less favourable than those for other aspects of their experience, suggesting that this is an area in which UK Higher Education needs to improve. Up till now, there has been little discussion about how best to produce effective feedback for the different assessment methods used in modern Statistics courses. This talk will summarise the characteristics of effective feedback, as described in the research literature, and will indicate how these guidelines can be applied to the assessment of analysisofdata tasks in Statistics courses. We will then present results from a small study of students in one Statistics course at the University of Glasgow in two conditions, one where feedback was given 'as usual' and the other where feedback was given in accordance with the principles of effective feedback. Finally, we will introduce a freely available, webbased quiz system which has been designed to give tailored feedback to multiple choice questions in a Statistics setting. 



Apr 22  Thu  Piotr Fryzlewicz (London School of Economics)  Statistics Seminar  
14:00  Thickpen transformation for time series  
Hicks K14  
Abstract: Traditional visualisation of time series data often consists of plotting the time series values against time and "connecting the dots". We propose an alternative, multiscale visualisation technique, motivated by the scalespace approach in computer vision. In brief, our method also "connects the dots", but uses a range of pens of varying thicknesses for this purpose. The resulting multiscale map, termed the ThickPen Transform (TPT) corresponds to viewing the time series from a range of distances. We formally prove that the TPT is a discriminatory statistic for two Gaussian time series with distinct correlation structures. Further, we show interesting possible applications of the TPT to measuring crossdependence in multivariate time series, and to testing for stationarity. In particular, we derive the asymptotic distribution of our test statistic, and argue that the test is applicable to both linear and nonlinear processes under low moment assumptions. Various other aspects of the methodology, including other possible applications, are also discussed. 



Apr 29  Thu  Andrew Wade (Strathclyde)  Statistics Seminar  
14:00  Nonhomogeneous random walks with asymptotically zero drifts  
Hicks K14  
Abstract: For this talk a random walk is a discretetime timehomogeneous Markov process on ddimensional Euclidean space. If such a random walk is spatially homogeneous, its position can be expressed as a sum of independent identically distributed random vectors. Such homogeneous random walks are classical and the literature devoted to their study extensive, particularly when the statespace is the ddimensional integer lattice. The most subtle case is when the mean drift (i.e., average increment) of the walk is zero. The assumption of spatial homogeneity, while simplifying the mathematical analysis, is not always realistic for applications. Thus it is desirable to study nonhomogeneous random walks. As soon as the spatial homogeneity assumption is relaxed, the situation becomes much more complicated. Even in the zerodrift case, a nonhomogeneous random walk can behave completely differently to a zerodrift homogeneous random walk, and can be transient in two dimensions, for instance. Such potentially wild behaviour means that results for nonhomogeneous random walks often have to be stated under rather restrictive conditions, and techniques from the study of homogeneous random walks are difficult to apply. I will give an introduction to some of the known results on nonhomogeneous random walks with asymptotically zero meandrift, that is, the magnitude of the drift at a point tends to 0 as the distance of that point from the origin tends to infinity. It turns out that this is the natural regime in which to look for important phase transitions in asymptotic behaviour. This includes work by Lamperti in the 1960s on recurrence/transience behaviour. I will also discuss recent joint work with Iain MacPhee and Mikhail Menshikov (Durham) concerned with angular asymptotics, i.e., exitfromcones problems. We show that, in contrast to recurrence/transience behaviour, the angular properties of nonhomogeneous random walks are remarkably wellbehaved in some sense in the asymptotically zero drift regime. 



May 6  Thu  Andy Wood (Nottingham)  Statistics Seminar  
14:00  Fractals, selfsimilarity and the estimation of fractal dimension: a statistical perspective.  
Hicks K14  
Abstract: The first part of the talk will give an elementary introduction to fractals, and will include discussion of what they are, some of the various ways in which they can arise and why they are of interest. Relevant concepts such as selfsimilarity will also be explained. The second part of the talk will briefly discuss statistical estimation of the dimension of a random fractal generated as a realisation of a suitable continuoustime stochastic process, which is observed on a finite grid. The estimation of fractal dimension is of theoretical and practical interest in a number of contexts. The asymptotic framework relevant here is "infill" asymptotics, and the limit theory for fractal dimension estimators in this setting can be quite nonstandard. 



May 13  Thu  Philip O'Neill (Nottingham)  Statistics Seminar  
14:00  Stochastic models and data analysis for healthcare associated infections  
K14  
Abstract: Antibiotic resistant pathogens such as MRSA and VRE are of considerable importance in healthcare settings in terms of both clinical and economic impact. In this talk we describe analyses of highly detailed datasets taken from hospital studies looking at, among other things, the effectiveness of control measures and the effect of undetected carriage. The methods involve formulating appropriate stochastic transmission models whose parameters are then estimated using MCMC methods. 



May 20  Thu  Kate Ren (Sheffield)  Statistics Seminar  
14:00  Incorporating Prior Information into Clinical Trial Designs  
Hicks I19  


May 20  Thu  Peter Gregory (Sheffield)  Statistics Seminar  
14:00  Looking for a simple solution to a simple problem: Bayesian modelling of positively skewed data  
Hicks I19  
Abstract: The motivation for this research was a medical cost data set from a clinical trial. If the proposed new intervention were to be accepted by a Regulatory Body then a Health Care Provider has to budget for future treatments for some members of the rest of the population. In this Bayesian analysis we want to determine the expected value for one unobserved member of this population from its posterior predictive distribution by firstly establishing the parametric data model that best captures the positive skew characteristics of the costs. We then develop a novel approach to modelling the priors that enable an expert's prior beliefs to be elicited while permitting a limited analytical study of the model. These techniques have been applied to recent medical data sets to establish their comparative efficiency when compared with classical estimators. 



May 20  Thu  Jonty Rougier (Bristol)  Statistics Seminar  
15:30  Complex systems: Accounting for model limitations  
Hicks I19  
Abstract: Many complex systems, notably environmental systems like climate, are highly structured, and numerical models, known as simulators, play an important role in prediction and control. It is crucial to account for limitations in simulators, since these can be substantial, and can vary substantially from one simulator to another. These limitations can be categorised in terms of input uncertainty, parametric uncertainty, and structural uncertainty. The talk explains this framework, and the particular challenge of accounting for simulator limitations in dynamical systems, using illustrations from a loworder model for glacial cycles. 



May 27  Thu  Graeme Sarson (Newcastle)  Statistics Seminar  
14:00  Forward models of prehistoric population dynamics  
Hicks K14  


Sep 30  Thu  Kostas Triantafyllopoulos (Sheffield)  Statistics Seminar  
14:00  Multivariate stochastic volatility modelling using Wishart autoregressive processes  
Lecture Theatre 6  
Abstract: This talk will discuss some of the research I conducted while in study leave. In particular a new multivariate stochastic volatility estimation procedure for financial time series will be developed. A Wishart autoregressive process is considered for the volatility precision covariance matrix, for the estimation of which a two stage procedure is adopted. In the first stage conditional inference on the autoregressive parameters is developed and the second stage develops unconditional inference, based on a NewtonRaphson iterative algorithm. The proposed methodology, suitable for medium dimensional data, bridges the gap between closedform estimation and simulationbased estimation algorithms in stochastic volatility modelling. Two examples, consisting of foreign exchange rates data and of data from the common constituents of the Dow Jones 30 Industrial Average index, illustrate the proposed methodology; for both examples we discuss asset allocation using as performance indicator meanvariance portfolio optimization. In this talk we will discuss Wishart processes, which may be of interest in their own right or targeting other than financial applications. 



Oct 21  Thu  Lauren Rodgers (Forensic Science Service)  Statistics Seminar  
14:00  A continuous model for deconvoluting DNA mixtures  
Lecture Theatre 6  
Abstract: There are numerous problems encountered in the interpretation and evaluation of DNA profiles, particularly when there is more than one contributor. The current statistical methods are based on binary models and make limited use of the quantitative information contained in the profile. We have developed a continuous model which can probabilistically take account of allelic dropout, allelic stutter and the amplification efficiency of allele given molecular weight. This presentation will include: on overview of DNA profiling; a description of our proposed continuous model; and some illustrative calculations with DNA mixtures. 



Oct 28  Thu  Richard Boys (Newcastle)  Statistics Seminar  
14:00  Linking systems biology models to data  
Lecture Theatre 6  
Abstract: This talk considers the assessment and refinement of a dynamic stochastic process model of the cellular response to DNA damage. The proposed model is a complex nonlinear continuous time latent stochastic process. It is compared to time course data on the levels of two key proteins involved in this response, captured at the level of individual cells in a human cancer cell line. The primary goal of is to "calibrate" the model by finding parameters of the model (kinetic rate constants) that are most consistent with the experimental data. Significant amounts of prior information are available for the model parameters. It is therefore most natural to consider a Bayesian analysis of the problem, using sophisticated MCMC methods to overcome the formidable computational challenges. 



Nov 18  Thu  Piotr Fryzlewicz (London School of Economics)  Statistics Seminar  
14:00  HaarFisz methodology for interpretable estimation of large, sparse, timevarying volatility matrices  
Lecture Theatre 6  
Abstract: The emergence of the recent financial crisis, during which many markets underwent changes in their statistical structure over a short period of time, illustrates the importance of nonstationary modelling in financial time series. We start this talk by advocating a simple nonstationary multivariate model for financial returns. One task of critical importance to a financial analyst is accurate estimation of the volatility matrix, and in our model, this will be a timevarying quantity. Our estimation method is based on Haar wavelet thresholding, supplemented with the essential variancestabilising Fisz transform (hence the name HaarFisz). Thanks to the use of Haar wavelets, our estimator: (a) has a natural inbuilt sparsity, i.e. local crossmarket correlations are naturally estimated as zero wherever possible, which enhances the invertibility of the estimated matrix; (b) adequately captures sudden regime changes; (c) is theoretically tractable, also in the pointwise sense; (d) is rapidly computable, which is important if the matrix is large. In addition, we take advantage of the nonlinearity of wavelet thresholding to propose two distinct version of the estimator, one of which is based on the polarisation identity. We use realdata examples to illustrate our methodology. 



Nov 25  Thu  Samuel Touchard (Sheffield)  Statistics Seminar  
14:00  Forecasting pollution levels using Dynamic Linear Models  
Lecture Theatre 6  
Abstract: In this talk, I will try to forecast the pollution levels of 5 five pollutants, from 8 years data. The model I used is a Dynamic Linear Model (DLM), a regression model where the parameter vector is no longer assumed constant over time. Also, 3 covariates (humidity, temperature, wind speed) will be used to get a better estimation. After introducing the issue of pollution, I will describe the model, in the univariate case first, and in the multivariate case afterwards. Then, I will apply this model to the data, do some comments about the results, how it would be possible to improve it, and give some ideas for further work. 



Dec 2  Thu  Andrew Parnell (University College Dublin)  Statistics Seminar  
14:00  Faster joint posterior modelling through marginal posterior mixtures  
Lecture Theatre 6  
Abstract: We discuss the issue of creating a joint posterior distribution for a set of parameters when only marginal posteriors are available (or are reasonable to compute). More specifically, for data '$x$ and parameters $\theta$ in $R^n$, we require $\pi(\thetax)$ from the marginal data posterior $\pi(\theta_ix_i)$. Through a simple adjustment of Bayes' theorem we can use $\pi(\theta_ix_i)$ to inform the joint posterior, provided $\pi(\theta_i)$ and $\pi(\theta)$ (the marginal and joint priors, respectively) are, in some sense, compatible. \par The technique can be further enhanced by treating $\pi(\theta_ix_i)$ as a mixture of distributions conjugate to the joint prior. In most cases, it is trivial to approximate any marginal posterior distribution as such a mixture. When the joint prior is Gaussian, the resulting posterior can then be obtained extremely quickly via any one of a number of standard Bayesian computational techniques. \par We apply this technique to two problems in palaeoclimatology (both described in Haslett et al 2006). The first involves longtailed random walk smoothing of temporal climate histories ($c(t)$) created from pollen sediment cores where pollen is sampled at $n$ layers $y_i$, $i=1, . . . , n$. The marginal posteriors $\pi(c_iy_i)$ are easily obtained by other means, whereas the random walk gives flat marginal prior distributions $\pi(c_i)$. We obtain the joint prior $\pi(cy)$ in a twostage process without resorting to more burdensome computational methods. The second problem involves spatial forward modelling of pollen changes given modern climate data (also known as response surface modelling; Huntley et al 1993). Here, the marginal posteriors are Gaussian surfaces with few hyperparameters; they are relatively quick to create. The joint posterior surface then becomes a mixture of Gaussian processes. Again, the twostage process dramatically decreases the computational burden, and allows for parallelisation. The models we propose have much in common with Rue et al (2009) and Holmstrom and Erasto (2002). The technique seems widely applicable across the field of statistical modelling. We explore some of the extensions which may allow for higher dimensional models or more complex prior distributions. 



Dec 9  Thu  Alison Etheridge (Oxford)  Statistics Seminar  
14:00  Modelling evolution in a spatial continuum: the spatial $\Lambda$FlemingViot process  
Lecture Theatre 6  
Abstract: One of the outstanding successes of mathematical population genetics is Kingman's coalescent. This provides a simple and elegant description of the genealogical trees relating individuals in a sample of neutral genes from a panmictic population, that is, one in which every individual is equally likely to mate with every other and all individuals experience the same conditions. But real populations are not like this. Spurred on by the recent flood of DNA sequence data, an enormous industry has developed that seeks to extend Kingman's coalescent to incorporate things like variable population size, natural selection and spatial and genetic structure. But a satisfactory approach to populations evolving in a spatial continuum has proved elusive. In recent joint work with Nick Barton, IST Austria, we introduced a framework for modelling the evolution of populations distributed in a spatial continuum. This leads to a new class of measurevalued processes which we will describe and, as time permits, explore in this talk. 



Dec 16  Thu  Grant Bigg (Sheffield)  Statistics Seminar  
14:00  Using icebergs as a tool in geoscience: how did the needle get into the haystack?  
Lecture Theatre 6  
Abstract: Since the sinking of the Titanic in 1912, icebergs have possessed a powerful aura for polar navigation. However, they are not only a threat to shipping but tell us about climate change, and the sediments dropped from them are key indicators of past climate fluctuations around the globe. In this talk the science of icebergs is explored, paying particular attention to where it intersects with sometimes difficult statistical issues. The power of statisticaldynamical modelling of icebergs to reveal new and interesting facts about past and present climate change is shown. The statistical analysis of remote sensing images is seen to be a powerful tool for aiding navigation as the Arctic sea routes are opened up. And finally, the use of systems control theory will be seen to offer the possibility of a new view of the evolution of the Greenland ice sheet over the last century. 



Feb 17  Thu  Mark Strong (University of Sheffield)  Statistics Seminar  
14:00  Managing Structural Uncertainty in Health Economic Decision Models  
Lecture Theatre 6  
Abstract: It was George Box who famously wrote 'Essentially, all models are wrong'. Given our limited understanding of the highly complex world in which we live this statement seems entirely reasonable. Why then, in the context of health economic decision modelling, do we often act as if our models are right even if we know that they are wrong? Imagine we have built a deterministic mathematical model to predict the costs and health effects of a new treatment, in comparison with an existing treatment. The model will be used by NICE to inform the decision as to whether to recommend the new treatment for use in the NHS. The inputs to the model are uncertain, and we quantify the effect of this input uncertainty on the model output using Monte Carlo methods. We may even quantify the value of obtaining more information. We present our results to NICE as a fait accompli. But, if we believe George Box then surely we should consider that our model output, and our uncertainty analysis, and our estimates of the value of information are all 'wrong' because they are generated by a model that is 'wrong'! The challenge is to quantify how wrong. This seminar will explore the problem of structural uncertainty in health economic decision models, along with some suggested approaches to managing this uncertainty. 



Feb 17  Thu  Siti Rahayu (University of Sheffield)  Statistics Seminar  
14:30  Interpretation Methods of Multivariate Control Chart's Signal  
Lecture Theatre 6  
Abstract: Multivariate control charts have been the most popular tool among the quality control/process control researchers when it comes to multivariate processes monitoring. The impact of correlation among process variables on multivariate process performance, the problem of multiplicity in hypotheses testing and the difficulties in monitoring a large number of univariate control charts simultaneously can be solved readily by implementing a multivariate control chart. The only drawback of using a multivariate control chart is that once the outofcontrol signal is triggered, the interpretation of the signal is potentially difficult. There are a number of interpretation methods have been proposed by researchers but so far all the methods give inconsistent results. Some of the interpretation methods will be introduced and the strength and the weaknesses will be discussed. A new approach will be introduced as another option for interpreting a multivariate control chart signal. 



Mar 3  Thu  Ajay Jasra (Imperial College)  Statistics Seminar  
14:00  On the stability of a class of sequential Monte Carlo methods in High Dimensions  
Lecture Theatre 6  
Abstract: We investigate the stability of a Sequential Monte Carlo (SMC) method applied to the problem of sampling from a single target density on R for large d. It is well known, using a single importance sampling step, one produces an approximation for the target distribution that deteriorates as the dimension d increases, unless the number of MC samples N increases at an exponential rate in d. This degeneracy can be avoided by introducing a sequence of artificial targets, starting from a `simple' target density and moving to the one of interest and using an SMC method to sample from the sequence. Using this class of SMC methods with a fixed number of samples, one can produce an approximation for which the effective sample size (ESS) converges to a random variable \varepsilon_N as d > \infty, such that 1<\varepsilon_{N}  


Mar 17  Thu  Martijn Pistorius (Imperial College)  Statistics Seminar  
14:00  MAXIMAL INCREMENTS OF RANDOM WALKS AND LEVY PROCESSES  
Lecture Theatre 6  
Abstract: A random walk reflected at its minimum is equal to the random walk minus its running minimum. The reflected process plays a role in various applications. It is related to the method of cumulative sums (CUSUM) used in mathematical statistics, and has been employed in various areas in applied probability, such as queueing theory, mathematical finance and mathematical genetics. For a random walk which stepsize distribution has finite negative mean and satisfies Cramer's condition, we show that the current value, the rescaled maximum and the overshoot are asymptotically independent, and identify explicitly the limitdistribution of the overshoot. We obtain analogous results for the corresponding statistics of a Levy process. As corollary we obtain a factorization of the exponential distribution. This is joint work with A Mijatovic. 



Mar 24  Thu  Tusheng Zhang (University of Manchester)  Statistics Seminar  
14:00  
Lecture Theatre 6  


Apr 7  Thu  Adrian Bowman (University of Glasgow)  Statistics Seminar  
13:30  Flexible regression models for environmental applications  
Lecture Theatre 6  
Abstract: Additive, and more general nonparametric, approaches to modelling extend standard regression methods by allowing very flexible, but smooth, relationships between variables of interest. The role of these models in environmental applications, where there is a need to model complex forms of spatial and temporal trends, as well as spatial and temporal correlation, will be discussed. Technical aspects of the talk will include computational strategies for spatiotemporal smoothing and ways of extending standard inferential methods. The data structures considered will include river networks as well as more standard spatial domains. Applications will include the modelling of SO2 pollution over Europe, water quality in the River Tweed and rainfallflow response in the river Dee. 



Apr 7  Thu  Oztas Ayhan (Middle East Technical University, Turkey)  Statistics Seminar  
15:00  Memory recall errors and their relation to survey response  
Lecture Theatre 5  
Abstract: This talk covers a study which compares selfreports during an interview with staff and students who attended a University health centre, with the records of visits to the same health centre over the previous 12 months. Design of the study reflects the effects of importance of the event, duration since the event, frequency of the occurrence of the event, measurement scale of the event, and bounded and unbounded recalling. In order to assess the extent of recall error, responses to retrospective questions on health centre visits are compared with administrative records. Statistical models are proposed for short and long term human memory recall error effects on responses. 



May 5  Thu  Jim Smith (University of Warwick)  Statistics Seminar  
14:00  Controlling A Remote Bayesian from being irrational Abstract  
Lecture Theatre 6  
Abstract: UK military commanders have a degree of devolved decision authority delegated from command and control (C2) regulators, and they are trained and expected to act rationally and accountably. Therefore from a Bayesian perspective they should be subjective expected utility maximizers. In fact they largely appear to be so. However when current tactical objectives conflict with broader campaign objective there is a strong risk that fielded commanders will lose rationality and coherence. By systematically analysing the geometry of their expected utilities, arising from a utility function with two attributes, we demonstrate in this paper that even when a remote C2 regulator can predict only the likely broad shape of her agents' marginal utility functions it is still often possible for her to identify robustly those settings where the commander is at risk of making inappropriate decisions. 



May 12  Thu  Lee Fawsett (University of Newcastle)  Statistics Seminar  
14:00  
Lecture Theatre 6  


May 19  Thu  Mathew Penrose (University of Bath)  Statistics Seminar  
14:00  Limit Theorems in Stochastic Geometry with Applications  
Lecture Theatre 6  
Abstract: For an empirical point process governed by a probability density function in dspace, consider functionals obtained by summing over each point some function which is locally determined. General laws of large numbers and central limit theorems for such functionals are known. We discuss such results, their extensions to point processes in manifolds, associated local limit theorems, and applications to particular functionals such as multidimensional spacings statistics, dimension estimators and entropy estimators. 



Sep 29  Thu  Sawaporn Siripanthana (Sheffield)  Statistics Seminar  
14:20  Multivariate surveillance for outbreak detection  
LT6  
Abstract: Early detection with a low false alarm rate is the main aim of outbreak detection as used in public health surveillance or in regard to bioterrorism. Several statistical methods have been implemented and used for monitoring the occurrence of outbreaks. For simplicity, univariate surveillance or parallel surveillance, separate monitoring of each continuous series, is usually implemented in practice. However, this has severe limitations arising because of multiplicity from multiple hypothesis testing and ignoring correlation between series which might reduce detection performance of systems if data are truly correlated. Additionally correlation within series is another issue which is often ignored but which should be considered, as health data are normally dependent over time. This talk will summarise existing univariate methods used for outbreak detection with their strength and weaknesses and look at extensions to the multivariate case. For dimensionality reduction in multivariate surveillance, a method based on the sufficiency property will be introduced. 



Oct 12  Wed  Mark Davis (Imperial College)  Statistics Seminar  
14:15  Pathwise stochastic calculus and applications to options on realized variance.  
K14  
Abstract: If $S_t, t\in[0,T]$ is the price of a financial asset, the realized variance is $\mathrm{RV}^d_T=\sum_{i=1}^n(\log(S_{t_i}/S_{t_{i1}}))^2$ where $t_i$ a prespecified increasing sequence of times in $[0,T]$. Most of the literature on this subject studies the continuoustime limit, which is $\mathrm{RV}^c_T=[\log S]_T$, the quadratic variation of the `logreturns' process $X_t=\log S_t$. Questions to be answered are how to price options on realized variance consistently with other options in the market and how to hedge these options. Recent research has focussed on modelfree approaches to these questions: we want to say as much as possible without committing ourselves to any particular stochastic process realization of $S_t$. However, this poses an immediate problem of interpretation in the passage from $\mathrm{RV}^d$ to $\mathrm{RV}^c$: we cannot use the standard probabilistic notions of convergence, since we do not have a probability space! An answer to this problem is provided in Hans Föllmer's 1981 paper Calcul d'Itô sans probabilités, where he derives an Itô formula just using real analysis for paths having the `quadratic variation property'. In some cases, we need an Itô formula valid for functions whose second derivatives are not continuous, say $f\in {\cal H}^2$. The standard approach to this in stochastic analysis goes via the Tanaka formula and local time, so the question arises whether we can have a pathwise theory of local time. Föllmer, with a Diploma student, did consider this question, but it seems there may be decisive advantages in considering `Lebesgue' partitions rather than `Riemann' partitions as Föllmer did, thereby getting a direct connection with Lévy's downcrossing representation of local time. This is a preliminary account of work in progress with Jan Ob\lój (Oxford). 



Oct 12  Wed  Claudia Klϋppelberg (TU Munich)  Statistics Seminar  
15:45  An introduction to COGARCH modelling with financial applications.  
K14  
Abstract: Modelling of stochastic volatility has triggered important research in the theory of stochastic processes. New models have been proposed to capture the ``stylized facts'' of volatility such as jumps, heavytailed marginals, long range dependence, and clusters in extremes. In recent years particular emphasis has been given to continuoustime modelling, since financial time series in liquid markets are highfrequency and irregularly spaced because of random trading times. Natural candidates of continuoustime models with jumps are Lévy or Lévydriven models, and we shall discuss some of the prominent examples for volatility modelling. Special emphasis is given to COGARCH models, which are continuous time versions of the very popular GARCH models. 



Oct 20  Thu  Simon Wood (University of Bath)  Statistics Seminar  
14:00  
LT6  


Nov 3  Thu  Nicola Loperfido (Universita degli Studi di Urbino)  Statistics Seminar  
14:00  Kurtosis and the Black Swan: some Fine Financial Findings  
LT6  
Abstract: The Black Swan: The Impact of the Highly Improbable is a book which became a bestseller by pointing out the relevance of extreme (i.e. tail) events in finance. It also depicted statisticians as being totally inept at dealing with such events, but very apt in deceiving themselves and others using the normal distribution and more complicated models. This is unfair, given the vast statistical literature devoted to non normal models for extreme financial events. However, it is also true that most of it is better suited for professional statisticians than for financial analysts with limited statistical backgrounds and little time to learn advanced statistical techniques. These analysts might find kurtosis a simple and useful tool for dealing with tail events. This seminar examines some properties of kurtosis and apply them to financial decisions. Theoretical results will be illustrated by using data collected from several financial markets. 



Nov 17  Thu  Sofia Dias (University of Bristol)  Statistics Seminar  
14:00  Checking consistency in Mixed Treatment Comparison Metaanalysis  
LT6  
Abstract: Indirect and mixed treatment comparisons (MTC), also known as network metaanalysis, represent an important development in evidence synthesis, particularly in decision making contexts. Rather than pooling information on trials comparing treatments A and B, A and C, B and C etc separately, MTC combines data from randomised comparisons, A vs B, A vs C, A vs D, B vs D, and so on, to deliver an internally consistent set of estimates while respecting the randomisation in the evidence. MTC allows coherent judgements on which of several treatments is the most effective and produces estimates of the relative effects of each treatment compared to every other treatment in a network  even though some pairs of treatments may not have been directly compared. However, doubts have been expressed about the validity of MTC, particularly the assumption of consistency between ``direct'' and ``indirect'' evidence. Inconsistency can be thought of as a conflict between ``direct'' evidence on a comparison between treatments B and C, and ``indirect'' evidence gained from A vs C and A vs B trials. Like betweentrial heterogeneity, inconsistency is caused by effectmodifiers, and specifically by an imbalance in the distribution of effect modifiers in the direct and indirect evidence. I will begin by defining inconsistency as a property of ``loops'' of evidence, and then provide details of the nodesplit and other, simpler, methods to assess whether there is inconsistency in a network and where it might be located. The merits and drawbacks of each method will be discussed using illustrative examples. 



Dec 1  Thu  Dario Spano (University of Warwick)  Statistics Seminar  
14:00  Canonical correlation for dependent gamma random measures.  
LT6  
Abstract: We will focus on the construction of dependent completely random measures (CRMs), with fixed margins, motivated by applications to Bayesian inference and Population Genetics. In particular, we will deal with vectors of gamma CRMs and characterize their distribution in terms of their canonical correlations, that is: we characterize the class of all dependent gamma measures whose finite dimensional distributions are given by a transition kernel with orthogonal polynomial eigenfunctions. We thus provide a results that shows that the canonical correlations (i.e. the kernel eigenvalues) are mixed moments of linear functionals of Dirichlet means evaluated at a random function. MarkovKrein and other identities on Dirichlet random means thus allow for several explicit representation for joint and conditional moment measures of our bivariate CRMs. We provide a few illustrations that show how some wellknown dependent vectors are included in our more general framework. Finally, if time allows, we will discuss an extension to measurevalued Markov processes. 



Dec 15  Thu  Peter Craig (University of Durham)  Statistics Seminar  
14:00  Ecotoxicological Risk Assessment: Beyond the Standard Species Sensitivity Distribution Model  Advantages and Benefits of Being Bayesian and Matters Arising.  
LT6  
Abstract: Ecotoxicological risk assessment deals with the potential for unwanted ecological effects of chemicals. A key statistical tool for risk assessors and managers is the use of the species sensitivity distribution (SSD) model as a proxy for the effects of a chemical in real ecosystems; in particular, the "safe concentration" calculation is based on an estimate of the 5th percentile of the SSD, obtained from relatively small amount of data. The standard procedure (Aldenberg and Jaworska, 2000) is based on a lognormal model assuming exchangeability. Much of this talk will discuss a number of recent developments in the modelling and use of SSDs: drawing strength from other data; use of loss functions; assessing and modelling nonexchangeability and the consequences for decisionmaking; handling the issue of "measurement error" (intertest variation); understanding and exploiting interspecies correlation; hierarchical random effects models. In parallel, I will consider the ongoing shift from frequentist to Bayesian methodology/philosophy in ecotoxicology, the advantages for the statistician of the Bayesian approach and the benefits this provides for ecotoxicology. I will finish by discussing some of the problems of being Bayesian, the questions they raise and some of the issues which Bayesians need to address. 



Feb 9  Thu  Vanessa Didelez (University of Bristol)  Statistics Seminar  
14:00  Mendelian Randomisation as an Instrumental Variable Approach to Causal Inference  
LT6  
Abstract: In epidemiology we often want to estimate the causal effect of an exposure on a health outcome based on observational data, where the possibility of unobserved confounding cannot be excluded. To deal with this problem, it has recently become popular to use a technique called Mendelian randomisation, where it is exploited that the exposure is associated with a genetic variant, which can be assumed to be unaffected by the same confounding factors and which makes it suitable as a socalled instrumental variable. In my talk, this technique is illustrated with various examples, in particular with the effect of alcohol consumption on blood pressure / hypertension. Different methods of using an instrumental variable to estimate the causal effect on a binary outcome are compared based on their theoretical properties as well as by simulation. Finally, it will be discussed if a Bayesian approach is useful in the context of Mendelian randomisation. References:Didelez and Sheehan (2007). Mendelian randomisation as an instrumental variable approach to causal inference, Statistical Methods in Medical Research, 16, 309330. Didelez, Meng and Sheehan (2010). Assumptions of IV methods for observational epidemiology, Statistical Science, 25, 2240. Palmer, Sterne, Harbord, Lawlor, Sheehan, Meng, Granell, Davey Smith, Didelez (2011). Instrumental variable estimation of causal risk ratios and causal odds ratios in Mendelian randomization analyses, The American Journal of Epidemiology, 173 (12). Jones, Thompson, Didelez and Sheehan (2012). On the choice of parameterisation and priors for the Bayesian analyses of Mendelian randomisation studies. To appear in Statistics in Medicine. 



Feb 16  Thu  Emma Jones (University of Sheffield)  Statistics Seminar  
14:00  Using A Bayesian Hierarchical Model for TreeRing Dating  
LT6  
Abstract: The width of treerings are determined by several factors including a local climatic signal apparent in that year, and the tree's growth trend. The climatic signal influences growth such that if the summer is warm and wet, the ring tends to be wider than if the summer is cold and dry. The growth trend describes the fast growth of the tree when it is young producing wide rings, followed by narrower rings as it ages. Other factors such as the soil conditions, presence of pests and diseases and competition for light and nutrients can also effect the ring width. The impact of these latter factors are collectively known as noise. It is assumed that trees within the same geographical region are exposed to the same climatic signal in each year, but that this differs from year to year. Treering dating involves matching sequences of treering widths from timbers of unknown age to dated sequences known as 'master' chronologies. Before matching takes place, all data are preprocessed to remove the growth trends. The timbers of unknown age (typically from a single building or woodland) are, firstly, sequentially matched against one another to identify the relative offsets with the 'best' match. The sequence produced is known as a 'site'chronology. The site chronology is then further matched to a local master chronology, to attempt to produce a date estimate for the site chronology. Traditionally the quality of the matches (both within the site chronology and between the site chronology and the master chronology) are assessed via the classical statistical ttest. A match at a particular offset is only considered to be 'best' if it produces the largest tvalue of all of the possible offsets and is greater than (an arbitrary value of) 3.5. The success rate of dating varies within sites and across regions; the national average being approximately 6070% but in some geographical areas the success rate can be much lower. One of the reasons for this is that the ttest does not utilise the wide range of information that could be used if a Bayesian model was used for treering dating. A Bayesian model for treering dating allows important prior information on parameters to be drawn into the inference process; this prior information can be taken from trees and can also be elicited from expert dendrochronologists. The model assumes that each ring width is composed of an overall climatic signal and some noise, and can be further extended to include climatic signals at varying geographic scales. Probabilities for a match at each offset can be produced conditional on the data and the prior specifications. The method removes the need to identify a single 'best' match, but it does rely on careful prior specification of parameters. Consequently, we have collated ring width data from trees of known age from several woods in the UK and are using these to provide informative prior knowledge. 



Feb 16  Thu  Seungjin Han (University of Sheffield)  Statistics Seminar  
14:30  Adaptive filtering for algorithmic pairs trading  
LT6  
Abstract: Pairs trading as a statistical arbitrage methodology has received considerable attention and popularity since its initial application in the 1980's. It is based on the assumption that a spread of two assets is meanreverted, and any violating fluctuations are taken advantage in order to realize profits. For real time detection of mean reversion, we employ a timevarying autoregressive model in a statespace form, online estimation of which is achieved by recursions of Kalman filtering and adaptive forgetting. Two novel algorithms for a variable forgetting factor are proposed and compared with a standard recursive least squares algorithm with adaptive memory. 



Feb 23  Thu  Jim Griffin (University of Kent)  Statistics Seminar  
14:00  Shrinking to some purpose  
LT6  
Abstract: In Bayesian statistics there has recently been interested in using priors whose density has a spike at zero in regression problems. These priors can lead to adaptive shrinkage of regression effects and so can be used for sparse regression problems where many of the regression coefficients are assumed to be zero (or very close to zero). This talk will consider the NormalGamma prior and extensions of it to encourage more general forms of shrinkage. For example, we might want to shrink differences of regression effects, or we might want to allow the ``importance'' of regression effects to change over time. 



Mar 1  Thu  Chris Sherlock (University of Lancaster)  Statistics Seminar  
14:00  A hidden Markov model for disease interactions  
LT6  
Abstract: Interactions between parasite species in a host are of great interest to ecologists but are often too complex to predict a priori. A longitudinal study of a population of field voles was undertaken with presence or absence of six different parasite species measured repeatedly. Although trapping sessions were regular, a different set of voles was caught at each session leading to incomplete profiles for all subjects. A simple analysis, which discards much of the data, has already been carried out; we offer a more powerful alternative. We use a discretetime hidden Markov model for each disease with transition probabilities dependent on covariates via a set of logistic regressions. For each disease the hidden states for each of the other diseases at a given time point form part of the covariate set for the Markov transition probabilities from that time point to the next. This allows us to gauge the influence of each parasite species on the transition probabilities for each of the other parasite species. Inference is performed via a Gibbs sampler, one iteration of which cycles through each of the diseases, first using an adaptive MetropolisHastings step to sample from the conditional posterior of the covariate parameters for that particular disease given the hidden states for all other diseases and then sampling from the hidden states for that disease given the parameters using the ForwardBackward algorithm. 



Mar 29  Thu  Eleanor Stillman (University of Sheffield)  Statistics Seminar  
14:00  Optimal design for multiresponse experiments  
LT6  
Abstract: Many statistical investigations require data to be collected so that the influence of explanatory variables on responses of interest can be deduced. Once there is more than a single response variable, there are potential conflicts of interest in selecting experiments which are efficient at estimating all responses. In this talk I will begin by introducing the general ideas of optimal experimental design and then focus on extensions to multiple responses. In particular, I will introduce a new composite optimality criterion which seeks to estimate a primary continuous response efficiently particularly when a second, binary, response has a positive outcome. I will also examine the practically important case of simultaneous estimation of both mean and variance of a single response. 



Apr 26  Thu  Ronnie Loeffen (University of Manchester)  Statistics Seminar  
14:00  Spectral representations for affine processes  
LT6  
Abstract: Affine processes are widely used in various areas of mathematical finance, like credit risk modelling, interest rate modelling and stochastic volatility models. One of the advantages of working with affine processes is that one can compute European option prices via Laplace/Fourier inversion after solving a system of nonlinear, first order ODEs. However, an explicit solution to this system exists only in a limited number of cases and numerically solving it seems cumbersome. Based on the work of Ogura (1974/75) on continuousstate branching processes, we discuss an alternative method in which the system of ODEs is replaced by a number of decoupled, linear, first order PDEs. Pros and cons of the method will be indicated and also some examples will be provided. 



May 3  Thu  Simon Wood (University of Bath)  Statistics Seminar  
14:00  Simple statistical models for complex ecological data  
LT6  
Abstract: Much ecological theory is based on models that are relatively simple to write down and simulate from, while at the same time being capable of displaying very complicated dynamics. This talk suggests that such near chaotic dynamics provide a case where it is sensible to abandon conventional likelihood or Bayesian approaches in favour of inference based on carefully chosen statistics of the data. The statistics should be designed to avoid the irregularity produced by highly nonlinear dynamics, while still being informative about the dynamic structure of the system being modelled. A simple approach to inference is proposed, which requires only the ability to simulate from the model. The approach has links to ABC, generalized method of moments, indirect inference and similar approaches, but requires rather little tuning. 



May 9  Wed  Sotiris Bersimis (University of Piraeus)  Statistics Seminar  
14:00  Multivariate SPC with emphasis in multiattribute processes  
LT10  
Abstract: Initially, the area of multivariate SPC will briefly overviewed and the basic procedures for implementing multivariate statistical process control via control charting will be reviewed. Specifically, multivariate extensions for all kinds of univariate control charts, such as multivariate Shewhart type control charts, MCUSUM control charts, and MEWMA control charts will be summarized and the problem of interpreting an outofcontrol signal will be briefly discussed. Additionally, since in the literature, little work has been done to deal with multivariate attributes processes, which are very important in practical production processes, the presentation will close by presenting the special case, which arises when the quality of process of interest is not characterized by continuous characteristics. Furthermore, after the key points of multiattribute process will presented, some procedures for controlling such processes will be discussed. 



May 17  Thu  Lee Fawcett (University of Newcastle)  Statistics Seminar  
14:00  Estimating return levels from serially dependent extremes  
LT6  
Abstract: In this talk, we investigate the relationship between return levels of a process and the strength of serial correlation present in the extremes of that process. Estimates of long period return levels are often used as design requirements, and peaks over thresholds (POT) analyses have, in the past, been used to obtain such estimates. However, analyses based on such declustering schemes are extremely wasteful of data, often resulting in great estimation uncertainty represented by very wide confidence intervals. Using simulated data, we show that  provided the extremal index is estimated appropriately  using all threshold excesses can give more accurate and precise estimates of return levels, allowing us to avoid altogether the sometimes arbitrary process of cluster identification. We then apply our method to two data examples concerning seasurge and wind speed extremes. 



Oct 4  Thu  Sigurd Assing (University of Warwick)  Statistics Seminar  
14:00  On the spatial dynamics of the stochastic heat equation  
Abstract: When modeling complex phenomena by random fields $u(x,t)$ depending on a $d$dimensional space parameter $x$ and time $t$ it is often useful to describe the dynamical behaviour of these fields by stochastic partial differential equations (SPDE). If a random field $u(x,t)$ is a solution of an SPDE then it is usually understood as a Markov process $u(\cdot,t)$, $t \geq 0$, taking values in a function space. Unfortunately this wipes out any structure of the solutions in the space parameter $x$. In this talk we recover this structure in the case where the SPDE is the socalled stochastic heat equation which is a simple toy example. The method used is mainly based on the technique of enlargement of filtrations and on Malliavin calculus. There is hope that it can be also applied w.r.t. other SPDEs. 



Oct 11  Thu  Andrew Beckerman (Sheffield (Animal and Plant Sciences))  Statistics Seminar  
14:00  Graphs and Covariance in Ecology and Evolution  
K14  
Abstract: Here I introduce two major research themes in ecology and evolution: food web networks and quantitative genetics. Food web network theory borrows heavily, if inelegantly, from graph theory with vertices/nodes typically representing species and edges representing anything from binary connection to process. In this section I introduce two classes of food web models, and issues currently facing their use centred on observation and process error. Quantitative genetics centres on estimating genetic variation and covariation among traits that are important to survival and reproduction of organisms. We focus on these, represented as a variancecovariance matrix, because variation is required for evolution to happen, and positive and negative covariation represents constraint on what can happen among traits. In this section I introduce the hierarchical modelling we typically use, important eigensystem properties of the varcov matrix, and recent transitions from parametric to Bayesian MCMC tools. The Bayesian MCMC methods appear to allow several types of comparisons among groups of individuals with strong inference. 



Oct 18  Thu  Markus Riedle (Kings College London)  Statistics Seminar  
14:00  The stochastic heat equation driven by cylindrical Levy processes  
K14  
Abstract: The heat equation driven by Gaussian noise is the most fundamental and simplest example of a stochastic partial differential equation. Most of its properties and characteristics are well understood. However, given the restriction of Gaussian noise it is important to understand this fundamental equation if driven by a more general noise. In this talk we consider the heat equation driven by cylindrical Levy processes. These kinds of processes were introduced together with D. Applebaum a few years ago and they are a natural generalisation of the Gaussian noise. We give several examples of cylindrical Levy processes and introduce a stochastic integral with respect to these processes. In the main part, we explain how the heat equation can be solved and we show some of the phenomena which arise if the heat equation is no longer perturbed by a Gaussian noise but by a cylindrical Levy process. 



Oct 25  Thu  Charles Taylor (University of Leeds)  Statistics Seminar  
14:00  Regression for circular data  
Abstract: We consider data of the form $(x_i,y_i)$ in which either $x$ and/or $y$ is measured as an angle and we seek to model a relationship in which $y$ can be predicted from $x$. Starting with a review of existing parametric models, we put these into a common framework and discuss problems with estimation. Various nonparametric models, which make use of circular kernels, are described, as well as their asymptotic behaviour and approaches to bandwidth selection. 



Nov 1  Thu  Postgraduate talks  Mike Spence and Steph Llewelyn (Sheffield)  Statistics Seminar  
14:00  Parameter Estimation of Individualbased models (Mike) and Statistical Modelling of Fingerprints (Steph)  
K14  
Abstract: Mike's Talk: Parameter Estimation of Individualbased models \par Individualbased models are increasingly used in ecological modelling as a way of trying to understand how individuals' behaviour leads to the emergent behaviour of the system. Generally the behaviour of the individuals is determined through a series of rules or algorithms, rather than described in a formal mathematical way, and this can represent a good way of capturing an ecologist's expertise and intuition. \par Quantifying uncertainty, estimating parameters and so on for a model of this sort are complicated by the fact that its probabilistic behaviour is implicit in its rules, rather than made explicit as in a more conventional statistical or stochastic model. This means that there is generally no explicit likelihood function available. I will discuss a number of methods of dealing with this and illustrate these methods with Railsback and Grimm's (2012) simplified model of woodhoopoe population dynamics. Stephanie's Title: Statistical Modelling of Fingerprints \par It is believed that fingerprints are determined in embryonic development. Unlike other personal characteristics the fingerprint appears to be a result of a random process. For example fingerprints of identical twins (whose DNA is identical) are distinct, and extensive studies have found little evidence of a genetic relationship in terms of types of fingerprint, certainly at the small scale. At a larger scale the pattern of ridges on fingerprints can be categorised as belonging to one of five basic forms: loops (left and right), whorls, arches and tented arches. The population frequencies of these types show little variation with ethnicity and a list of the types occurring on the ten digits can be used as an initial basis for identification of individuals. However, such a system would not uniquely identify an individual although the frequency of certain combinations could be extremely small. At a smaller scale various minutiae or singularities can be observed in a fingerprint. These include ridge endings and bifurcations, amongst others. Typical fingerprints have several hundred of these as well as two key points (with the exception of a simple arch) referred to as the core and delta, which are focal points of the overall pattern of ridges. Modern identification systems are based upon ridge endings and bifurcations, not least because they are the easiest to determine automatically from image analysis. The configuration of these minutiae is unique to the individual. \par The presentation will give an introduction to fingerprints from a forensic context and also outline a method use for matching a finger mark to a fingerprint. 



Nov 15  Thu  Simon Spencer (University of Warwick)  Statistics Seminar  
14:00  Causal inference for biochemical networks  
Abstract: In observation experiments it is impossible to distinguish between association and causation. To uncover causal relationships, interventions must be included in the experimental design. In complex systems, such as biochemical networks, there is frequently a high degree of association between interacting parts of the system. The aim of causal network inference is to untangle the causal structure behind these associations. In this study we developed a statistical model that captures the effect of inhibitors (an intervention) in a protein signalling network. We then used this model to perform causal network inference on protein microarray data from breast cancer cell lines. We were able to demonstrate that a causal inference approach increases the accuracy of the inferred networks. 



Nov 22  Thu  Barbel Finkenstadt (University of Warwick)  Statistics Seminar  
14:00  Modeling and inference for gene expression time series data (an overview)  
Abstract: A central challenge in computational modeling of dynamic biological systems is parameter inference from experimental time course measurements of gene expression. We present an overview of the modeling approaches based on stochastic population dynamic models and their approximations. On the mesoscopic scale (small populations), we present a two dimensional continuoustime Bayesian hierarchical model which has the potential to address the different sources of variability that are relevant to the stochastic modelling of transcriptional and translational processes at the molecular level, namely, intrinsic noise due to the stochastic nature of the birth and deaths processes involved in chemical reactions, extrinsic noise arising from the celltocell variation of kinetic parameters associated with these processes and noise associated with the measurement process. Inference is complicated by the fact that only the protein and rarely other molecular species are observed which is typically entailing problems of parameter identification in dynamical systems. On the macroscopic (or large populations) scale, we introduce a mechanistic 'switch' model for encoding a continuous transcriptional profile of genes over time with the aim of identifying the timing properties of mRNA synthesis which is assumed to switch between periods of transcriptional activity and inactivity, each time leading to the transition of a new steady state, while mRNA degradation is an ongoing linear process. The model is rich enough to capture a wide variety of expression behaviours including periodic genes. Finally, I will also give a brief introduction to some recent work on inferring the periodicity of the expression of circadian and other oscillating genes. Joint work with: Maria Costa, Dan Woodcock, Dafyd Jenkins, David Rand, Michal Komorowski (Warwick Systems Biology) 



Nov 29  Thu  Christopher Brignell (Nottingham)  Statistics Seminar  
14:00  Statistical shape analysis, with an application to chemoinformatics  
K14  
Abstract: Statistical methods for evaluating and comparing shapes are necessary is a wide range of disciplines. For example, in biology we may wish to classify an organism based on its shape, or in computer science we may wish to develop methods for automated face or fingerprint recognition. One emerging application is to molecular structures such as proteins and DNA to investigate properties of chemical bonding. In this talk I will provide an introduction to shape analysis and then apply the results to chemoinformatics. 



Dec 6  Thu  Ian Vernon (Durham)  Statistics Seminar  
14:00  Galaxy Formation: A Bayesian Uncertainty Analysis  
K14  
Abstract: The question of whether there exists large quantities of Dark Matter in our Universe is one of the most important problems in modern cosmology. This project deals with a complex model of the Universe known as Galform, developed by the ICC group, at Durham University. This model simulates the creation and evolution of approximately 1 million galaxies from the beginning of the Universe until the current day, a process which is very sensitive to the presence of Dark Matter. A major problem that the cosmologists face is that Galform requires the specification of a large number of input parameters in order to run. The outputs of Galform can be compared to available observational data, and the general goal of the project is to identify which input parameter specifications will give rise to acceptable matches between model output and observed data, given the many types of uncertainty present in such a situation. As the model is slow to run, and the input space large, this is a very difficult task. We have solved this problem using general techniques related to the Bayesian treatment of uncertainty for computer models. These techniques are centred around the use of emulators: fast stochastic approximations to the full Galform model. These emulators are used to perform an iterative strategy known as history matching, which identifies regions of the input space of interest. Visualising the results of such an analysis is a nontrivial task. The acceptable region of input space is a complex shape in high dimension. Although the emulators are fast to evaluate, they still cannot give detailed coverage of the full volume. We have therefore developed fast emulation techniques specifically targeted at producing lower dimensional visualisations of higher dimensional objects, leading to novel, dynamic 2 and 3dimensional projections of the acceptable input region. These visualisation techniques allow full exploitation of the emulators, and provide the cosmologists with vital physical insight into the behaviour of the Galform model. 



Dec 13  Thu  Jenny Barrett (Leeds)  Statistics Seminar  
14:00  Identifying causal genetic variants and other related problems in statistical genetics  
K14  
Abstract: Genomewide association (GWA) studies have been successful in recent years at finding associations between common genetic variants and disease by careful application of simple statistical methods. For most common diseases, this has led to the identification of a number of genetic regions that clearly harbour a genetic variant or variants that influence risk of disease. However, due to strong and complex patterns of correlation between genetic variants located close together, it is usually still unknown which variant(s), and often even which gene, in the region actually has a causal effect on the trait. We are applying statistical approaches to shed light on what is going on in the genetic regions associated with melanoma. Our primary approach is to select the most parsimonious model(s) that explain the association signal in the region (e.g. using penalized logistic regression of all variants in the regions simultaneously), and then as a second step to look at biological plausibility of the models. There are various outstanding problems in this area. Is there a more effective way of combining statistical and biological information? Regions may be genotyped at several different levels of density, right down to the highest resolution of knowing the entire genetic sequence in the region. If data are available at different densities on different subsets of individuals, how can they best be combined? Can including related individuals in the analysis help in the identification of causal variants, especially if these are rare? These problems will be discussed in further detail, with time for questions  and any suggestions of answers! 



Jan 30  Wed  James Norris and Jean Bertoin (Sheffield Probability Day) (Cambridge and ETH Zurich)  Statistics Seminar  
14:15  James Norris (Cambridge) 2.15 pm
A consistency estimate for Kac's model of elastic collisions in a dilute gas.
Jean Bertoin (ETH Zurich) 3.45 pm The 2012 Applied Probability Trust Lecture: Almost giant clusters for percolation on large trees with logarithmic heights. 

LT 7  
Abstract: Abstract for James Norris's talk: Kac's process is a natural stochastic particle model, of mean field type, for the evolution of particle velocities under elastic collisions. Formally this should converge to the spatially homogeneous Boltzmann equation in the large particle number limit. In one of the physically interesting cases, namely hard sphere collisions, this was proved by Sznitman. We will discuss a new proof this result, which leads to some quantitative refinements, based on the simple approach of treating the martingale decomposition for linear functions of Kac's process as a random perturbation of Boltzmann's equation. Abstract for Jean Bertoin'ss talk: We consider Bernoulli bond percolation on a tree with size $n\gg 1$, with a parameter $p(n)$ that depends on the size of that tree. Our purpose is to investigate the asymptotic behavior of the sizes of the largest clusters for appropriate regimes. We shall first provide a simple characterization of tree families and percolation regimes which yield giant clusters, answering a question raised by David Croydon. In the second part, we will review briefly recent results concerning two natural families of random trees with logarithmic heights, namely recursive trees and scalefree trees. We shall see that the next largest clusters are almost giant, in the sense that their sizes are of order $n/\ln n$, and obtain precise limit theorems in terms of certain Poisson random measures. A common feature in the analysis of percolation for these models is that, even though one addresses a static problem, it is useful to consider dynamical versions in which edges are removed, respectively vertices are inserted, one after the other in certain order as time passes. 



Feb 7  Thu  Amy Baddeley and Stefan Blackwood  Statistics Seminar  
14:00  Any Baddeley:
Using Bayes Factors to analyse finemapped genotype data
Stefan Blackwood: Partially observed systems 

K14  
Abstract: Abstract for Amy Baddeley's talk: Recent developments in genetic analysis mean that we have been able to identify many associations between genetic variants and common diseases. However, it is likely that most of the variants identified so far are not actually the causal variants, but are in fact confounders. Now the priority is shifting to identifying the causal variant in a disease association region (finemapping). Methods utilised in published studies to identify causal variants include the likelihood ratio (LR) and other frequentist methods. However, high levels of correlation, rare causal variants and those with small effect sizes mean such analyses may not work in all situations. The restrictive effects of these may be partially countered by incorporating functional biological information into an analysis. I will begin by giving a brief introduction to the genetic setting of the problem and the problem itself. I will then outline a general framework of analysis, "filtering", and the main method that will be presented uses the Bayes Factor (BF) in this framework. BF is the ratio of the probability of the data under alternative and null hypotheses, with a larger value indicating more evidence in favour of the alternative hypothesis. I will show the results of analyses using realistic simulated datasets and explore using fairly uninformative priors compared to using priors based on functional data. Our results indicate that BFs are a promising tool for incorporating functional information into finemapping studies. Abstract for Stefan Blackwood's talk: Suppose you have a random system which is not directly observable, instead you have a sequence of partial observations. Using the information gathered from these observations, what can we infer about the underlying system? Using stochastic models to make these deductions is known as stochastic filtering. During this talk I will provide a brief account of linear and nonlinear stochastic filtering in the presence of LÃ©vy noise and their respective cornerstones the Kalman Bucy filter, and the Zakai equation. 



Feb 14  Thu  Elizabeth Boggis and Samuel Touchard  Statistics Seminar  
13:30  Elizabeth Boggis: Exploiting Bayesian Shrinkage within a Linear Model Framework to identify Exome Sequence Variants associated with Gene Expression Samuel Touchard: MicroRNA predictions using Bayesian graphical models 

K14  
Abstract: Elizabeth's Abstract: NextGeneration exome sequencing identifies thousands of DNA sequence variants in each individual. Methods are needed that can effectively identify which of these variants are associated with changes in gene expression. The NormalGamma prior has been shown to induce effective and flexible shrinkage in the Bayesian linear model framework (Griffin and Brown 2010). Using simulated data we assess the efficacy and limitations of this Bayesian shrinkage framework in parsimoniously identifying such sequence variants. We further develop a Bayesian linear model to include the uncertainty in gene expression; SNP functional information obtained from online databases; and the uncertainty in the allele calls as quantified by the quality score. Samuel's Abstract: In this presentation we describe miRNA networks for patients suffering from Acute Coronary Syndrome (ACS). miRNA are noncoding RNAs that regulate gene expression. We are interested in building an association network, which will identify (within quantifiable uncertainty) miRNAs that regulate particular genes (or groups of genes) and thus providing important information of genetic functionality or disfunctionality. Data were collected, consisting of gene expression levels of miRNAs and mRNAs of patients who suffer from ACS. RNA was extracted from blood samples at two time points, and expression levels were quantified with affymetrix genechip arrays and normalised using puma package for microarray data analysis. The method is broken down to 3 stages. In the first stage a dimensionality reduction is performed; using TargetScan association scores the miRNA expressions are narrowed down, as are the gene expressions by using distance similarity procedures such as clustering and latent process decomposition. In the second stage a Bayesian graphical model is proposed, according to which associations of gene expressions and miRNA expressions are inferred and an association matrix is extracted. The methodology uses simulationbased methods, in particular Markov chain Monte Carlo, and benefits by managing uncertainty at a complex network. Finally, in the third stage and using the association matrix the network is constructed. Some extensions of this model will be discussed. 



Feb 21  Thu  Steven Perkins (Bristol)  Statistics Seminar  
14:00  Stochastic Fictitious Play with Continuous Action Sets  
K14  
Abstract: Stochastic approximation is a widely used tool which allows the limiting behaviour of a stochastic, discrete time, learning procedures on $\mathbb{R}^K$ to be studied using an associated continuous time, deterministic, dynamical system. We extend the asymptotic pseudotrajectory approach to stochastic approximation so that the processes can take place on any Banach space. This allows us to consider an iterative process of probability measures (or probability densities) on a compact subset of $\mathbb{R}$ as opposed to the regular stochastic approximation framework which is limited to probability mass functions on $\mathbb{R}^K$. A common application of stochastic approximation in game theory is to study the limiting behaviour of a discrete time learning algorithm, such as stochastic fictitious play, in normal form games. However, whilst learning dynamics in normal form games are now well studied, it is not until recently that their continuous action space counterparts have been examined. Our Banach space stochastic approximation framework shows that in a continuous action space game the limiting behaviour of stochastic fictitious play can be studied using the associated smooth best response dynamics on the space of finite signed measures. We show that stochastic fictitious play will converge to an equilibrium point in single population negative definite games, twoplayer zerosum games and $N$player potential games, when they have Lipschitz continuous rewards over a compact subset of $\mathbb{R}$. 



Feb 28  Thu  Marian Farah (MRC Cambridge)  Statistics Seminar  
14:00  Bayesian Emulation and Calibration of a Dynamic Epidemic Model for H1N1 Influenza  
K14  
Abstract: Increasingly, mechanistic epidemic models are playing an important role in strategies for epidemic management. In the attempt to control an epidemic, the goal of model development is to provide efficient estimation of model parameters to allow timely assessment and prediction of the epidemic evolution as new data become available. In this work, we address the problem of efficient parameter estimation in the context of a model for H1N1 influenza, implemented as a dynamic computer simulator. We propose an efficient approximation to the dynamic simulator using an emulator, a statistical model, that combines a Gaussian process prior for the output function of the simulator with a dynamic linear model for its evolution through time. This modelling framework is both flexible and tractable, resulting in efficient posterior inference through Markov Chain Monte Carlo. We illustrate the proposed methodology using simulated H1N1 influenza epidemic data. 



Mar 7  Thu  Dennis Prangle (Lancaster)  Statistics Seminar  
14:00  Summary statistics for likelihoodfree model choice  
LT C  
Abstract: A central statistical goal is to choose between alternative explanatory models. This work is motivated by population genetic models, which are typically complicated stochastic processes whose likelihoods are numerically intractable. Hence it is not possible to use statistical methods based on evaluating likelihood functions. Approximate Bayesian computation (ABC) is a commonly used likelihoodfree method for such situations. ABC simulates data for many parameter values under each model and compares these to the observed data. The comparison is based on vectors of summary statistics of the data. More weight is given to models which produce simulated vectors close to that for the observations. The choice of summaries turns out to be crucial to the efficiency and accuracy of the inference algorithm. This talk presents a method to select good summary statistics for ABC model choice. An application is also presented, choosing between demographic models of Campylobacter jejuni, a bacterial pathogen responsible for a large proportion of gastroenteritis cases. 



Mar 14  Thu  Keith Worden (Sheffield  Mechanical Engineering)  Statistics Seminar  
14:00  Applications of Probability and Statistics in Structural Dynamics  
LT C  
Abstract: Probability and statistics are vital tools in the modern analysis of structural dynamic systems. This is partly because many of the forces which excite the structures we are interested in are random and partly because many of the measurements and processes we study are (sometimes extremely) uncertain. This talk will present some applications of probability and statistics made in the Dynamics Research Group in Sheffield in recent years. Topics covered may include the design of damage detection systems based on statistical pattern recognition; removal of artefacts from data using concepts from econometric time series analysis; Bayesian sensitivity analysis of large nonlinear models and modelling of nonlinear dynamical systems using Markov Chain Monte Carlo methods. 



Mar 21  Thu  John Stevens (ScHaRR)  Statistics Seminar  
14:00  Health Technology Assessment: A Day in the Life of a HEDS Statistician  
K14  
Abstract: Health technology assessment (HTA) typically involves comparing the population mean costs and benefits of two or more interventions. The assessment is done using a decision analytic model over a lifetime horizon which gives rise to structural and parameter uncertainty. After introducing the current decision rule based on the incremental costeffectiveness ratio, we will discuss some of the statistical issues involved in an HTA such as making comparisons between treatments that have not been compared in randomised controlled trials (RCTs); the extrapolation of evidence beyond the duration of a trial to estimate population mean survival; modelling nonfatal events such as development of Type 2 diabetes; modelling bivariate outcomes such as progressionfree survival and death. In some cases, methods are available that are not well known in the health economic literature, whilst others depend on the format of the data and the amount of data that is available. 



Apr 11  Thu  Heather Battey (Bristol)  Statistics Seminar  
14:00  Nonparametric estimation of a multidimensional density: some recent theory and methodology.  
K14  
Abstract: Density estimation is one of the most actively studied challenges in statistics. Whilst fully agnostic estimators can be appealing in low dimensions, the performance of such estimators deteriorates rapidly for a fixed sample size as the number of dimensions grows. This provides motivation for estimating within a restricted subset of the set of all pdimensional Lebesgue densities, thereby reducing estimation error, even if this produces some approximation error when the constraint is not satisfied. In the first half of the talk, I will consider the restriction to the class of pdimensional elliptic densities and, within this framework, present a twostage nonparametric estimator for the Lebesgue density based on Gaussian mixture sieves. Under the online Exponentiated Gradient (EG) algorithm of Helmbold et al. (1997) and without restricting the mixing measure to have compact support, the estimator produces estimates converging uniformly in probability to the true elliptic density at a rate that is independent of the dimension of the problem. The rate performance (and optimal tuning parameter) associated with our estimator depends on the tail behaviour of the underlying density rather than on smoothness properties, and we provide a rule of thumb for estimating the relevant quantity based on observables. Although the rule of thumb is based on a particular member of the elliptic class, simulations indicate that the procedure generalises to other members of this class. In the second half of the talk, I will present some ongoing work on multidimensional density estimation. I will introduce a new class of procedures that are attractive in that they offer both flexibility and the possibility of incorporating constraints, whilst possessing a succinct representation which may be stored and evaluated easily. The latter property is of paramount importance when dealing with large datasets, which are now commonplace in many application areas. In a simulation study, we show that our approach is universally unintimidated across a range of data generating mechanisms, and can often outperform popular nonparametric estimators. Moreover, its performance is shown to be robust to the choice of tuning parameters, which is an important practical advantage of our procedure. The estimator is implemented in a binary classification task arising in medical statistics. 



Apr 18  Thu  Jochen Einbeck (Durham)  Statistics Seminar  
14:00  Principal curves and surfaces: Data visualization, compression, and beyond  
K14  
Abstract: Principal curves and surfaces have been proposed about two decades ago as a tool for nonlinear dimension reduction. Descriptively, they can be defined as smooth objects (of dimension 1 and 2, respectively) capturing the "middle" of a (potentially highdimensional) data cloud. Though a relatively large amount of literature has discussed methods and algorithms for the estimation of principal curves and surfaces, most of this research stops here, and does not consider exploiting the fitted curve or surface once it is established. One may find this surprising, as the parametric analogue, linear principal component analysis, is rarely used as an end in itself, but unfolds is power only when used as an integrated data compression step for some high dimensional, say, regression or classification problem. One reason for this reluctance may be that several rather cumbersome technicalities, such as the computation of distances or projection indexes, need to be solved before a fitted principal curve or surface can be used for further inferential purposes such as regression or classification. In this talk, we describe briefly how such problems can be resolved, and give some examples, stemming from current collaborative work, which illustrate how "local" principal curves and surfaces can be efficiently used as a nonparametric dimension reduction tool, enabling further statistical analysis based on the fitted principal object. We will focus on a case study involving the compression of the thermochemical state space of chemical combustion systems. 



Apr 25  Thu  Alex Mijatovic (Imperial)  Statistics Seminar  
14:00  A new look at shortterm implied volatility in asset price models with jumps  
K14  
Abstract: This talk discusses the implied volatility smile for options close to expiry in the exponential Lévy class of asset price models with jumps. We introduce a new renormalisation of the strike variable with the property that the implied volatility converges to a nonconstant limiting shape, which is a function of both the diffusion component of the process and the jump activity (BlumenthalGetoor) index of the jump component. Our limiting implied volatility formula relates the jump activity of the underlying asset price process to the short end of the implied volatility surface and sheds new light on the difference between finite and infinite variation jumps from the viewpoint of option prices: in the latter, the wings of the limiting smile are determined by the jump activity indices of the positive and negative jumps, whereas in the former, the wings have a constant modelindependent slope. This result gives a theoretical justification for the preference of the infinite variation Lévy models over the finite variation ones in the calibration based on the shortmaturity option prices. 



May 2  Thu  Christopher Hunter (Sheffield  Chemisty)  Statistics Seminar  
14:00  
K14  


May 9  Thu  Idris Eckley (Lancaster)  Statistics Seminar  
14:00  Coherence analysis of multivariate time series  
K14  
Abstract: Data collection systems are widely used within our everyday lives. For example within the energy sector they are used to record process activity within energy generations sites. These loggers are capable of sampling data at high rates, at a number of locations and recording multiple process aspects at each location. Such series are typically nonstationary in nature, with potentially timevarying dependence between the various series components. In this talk we consider the problem of modelling and estimating the coherence structure within such time series. In particular we focus on the challenge of identifying whether the dependence between a pair of components is direct or indirectly driven by other components of the series, illustrating our approach using examples taken from neuroimaging and wind energy. 



May 16  Thu  Peter Moerters (Bath)  Statistics Seminar  
14:00  Clustering in spatial preferential attachment networks  
K14  
Abstract: I define a class of growing networks in which new nodes are given a spatial position and are connected to existing nodes with a probability mechanism favouring short distances and high degrees. The competition of preferential attachment and spatial clustering gives this model a range of interesting properties. Empirical degree distributions converge to a limiting power law, and the average clustering coefficient of the networks converges to a positive limit. A phase transition occurs in the global clustering coefficients and empirical distribution of edge lengths. The talk is based on joint work with Emmanuel Jacob (ENS Lyon). 



Oct 3  Thu  Stephen Connor (York)  Statistics Seminar  
14:00  Mixing time for a random walk on a ring  
Abstract: We consider a variant of a process used in random number generation, and previously studied by Chung, Diaconis and Graham. This a random walk on the integers mod n (n odd), which at each step either increments by 1 or doubles its value, but where the probability of doubling is a decreasing function of n. We use a mixture of representation theory and probability to show that the total variation distance for this process exhibits a cutoff phenomenon. This is joint work with Michael Bate (York). 



Oct 9  Wed  Andreas Kyprianou (Bath)  Statistics Seminar  
14:15  Censored Stable Processes  
LT6  
Abstract: We look at a general twosided jumping strictly alphastable process where alpha is in (0,2). By censoring its path each time it enters the negative half line we show that the resulting process is a positive selfsimilar Markov Process. Using Lamperti's transformation we uncover an underlying driving Lévy process and, moreover, we are able to describe in surprisingly explicit detail the WienerHopf factorization of the latter. Using this WienerHopf factorization together with a series of spatial path transformations, it is now possible to produce an explicit formula for the law of the original stable processes as it first *enters* a finite interval, thereby generalizing a result of Blumenthal, Getoor and Ray for symmetric stable processes from 1961. This is joint work with Juan Carlos Pardo and Alex Watson. 



Oct 9  Wed  Thomas Mikosch (Copenhagen)  Statistics Seminar  
15:45  Power Law Tails in Applied Probability  Some Recent Developments. [The 2013 Applied Probability Trust Lecture]  
Abstract: For many decades, regular variation has been a useful tool in various areas of applied probability theory, including queuing, branching, renewal theory, stochastic networks, time series analysis, extreme value theory, insurance, and tails (i.e., distributions with power law tails) naturally appear as limits for normalized and centered maxima and sums of independent and identically distributed random variables or as domain of attraction condition for such limit laws. However, models whose components have power law tails are not always motivated by asymptotic theory; regular variation is a convenient way of describing unusually large values, for example, catastrophic claims in an insurance portfolio, large and long transmission times in the Internet, big losses/gains on the stock market, etc. Since the encyclopedia Regular Variation by N. Bingham, C. Goldie and J. Teugels (Cambridge UP) appeared in 1987, various extensions and modifications of regular variation have been successfully developed and applied. In this talk, we consider some newer developments. Those include the notion of a regularly varying time series (i.e., the finitedimensional distributions of such a series have power law tails), functional regular variation of stochastic processes, random fields and random sets, and large deviations of regularly varying structures. 



Oct 10  Thu  Ziyad Alhussain (Sheffield)  Statistics Seminar  
14:00  Eliciting beliefs about a variance parameter  
Hicks Seminar Room J11  
Abstract: In eliciting an expert's opinion, we ask the expert to report judgements about the observable quantity. Then we fit those judgements into a probability distribution that best describes the expert's beliefs. One of the challenges in elicitation is to make direct judgements about the variance parameter of the normal distribution. Hence, we aim to find an elicitation method that best fits the expert's opinion about the variance into a probability distribution. In this talk, I will present two elicitation methods that attempt to fit the expert's judgements about the variation of normally distributed data into a probability distribution. The first method depends on Bayes' theorem where the expert is asked to update the initial judgements given hypothetical data. We then illustrate that the expert may find difficulty in updating judgements using Bayes' theorem. Therefore, we propose an elicitation method that does not depend on Bayes' theorem, easier to use and works for the assumption of conjugate and nonconjugate prior distributions. We conclude by an interactive example using a proposed software tool. 



Oct 10  Thu  Fatimah Aloef (Sheffield)  Statistics Seminar  
14:00  Bayesian experimental design in health economics  
Hicks Seminar Room J11  
Abstract: In health economics, the concept of health care evaluation refers to identify, measure, value and compare the cost as well as the benefits of different health care innervations to allocate the limited health recourses wisely. CostEffectiveness Analysis (CEA) has been the most widely used method to derive such allocation decisions, especially for those at the National Institute for Health and Clinical Excellence (NICE) in the UK. This evaluation technique uses Quality Adjusted Life Years (QALYs) as an outcomes measure in order to be able to compare different health care interventions directly. There are different techniques to measure the "Q" part of this quantity which reflects the quality of life for health outcomes, namely utility. Recently, there has been an increase interest in using Discrete Choice Experiments (DCEs) to elicit health state utilities as an alternative for the cardinal methods. Utilities are required for all health states defined by a classification system. However, discrete choice data is collected to a subset of health states, and then a model fitted to estimate the utilities for any health state defined by the classification system. Thus, an optimal choice design is required to estimate the utilities within QALYs framework precisely. In this talk I will consider the problems of constructing choice design for health evaluation purpose. Particularly, anchoring health utility values produced by the DCE into 01(deadfull health) scale to be used within QALY framework, the dependency problem of optimum choice design on the unknown choice model's parameters, and simplifying the choice task and its effect on the design efficiency. The experimental design used in our work is illustrated through a pairwise comparison of practical health example, AQL5D classification system. 



Oct 17  Thu  Dennis Prangle (Lancaster)  Statistics Seminar  
14:00  Summary statistics for likelihoodfree model choice  
Abstract: A central statistical goal is to choose between alternative explanatory models. This work is motivated by population genetic models, which are typically complicated stochastic processes whose likelihoods are numerically intractable. Hence it is not possible to use statistical methods based on evaluating likelihood functions. Approximate Bayesian computation (ABC) is a commonly used likelihoodfree method for such situations. ABC simulates data for many parameter values under each model and compares these to the observed data. The comparison is based on vectors of summary statistics of the data. More weight is given to models which produce simulated vectors close to that for the observations. The choice of summaries turns out to be crucial to the efficiency and accuracy of the inference algorithm. This talk presents a method to select good summary statistics for ABC model choice. An application is also presented, choosing between demographic models of Campylobacter jejuni, a bacterial pathogen responsible for a large proportion of gastroenteritis cases. 



Oct 24  Thu  Lindsey Lee (School of Earth and Environment  Leeds University )  Statistics Seminar  
14:00  Statistical Methods for Understanding Uncertainty in a Global Aerosol Model  
Hicks Seminar Room J11  
Abstract: Uncertainty is inherent in the modelling of complex processes associated with climate science. Model uncertainty arises in any computer model that is restricted in terms of computational power and current knowledge but can broadly be defined in terms of input, parametric and structural uncertainty. Structural uncertainty can be considered by comparing outputs from different computer models. A lot of progress has been made in quantifying the effect of structural uncertainty on aerosol model predictions through the AEROCOM project. We have made progress in the quantification and understanding of parametric and input uncertainty by application of statistical methods in the NERC AEROS project. In this talk I will explain the statistical methods that have been applied in the AEROS project to help us understand and quantify parametric uncertainty in the GLOMAP aerosol model. These methods include expert elicitation, experimental design, emulation and sensitivity analysis. I will then show some of the results we have from applying these methods to study 28 uncertain parameters (and emissions) and their effects on GLOMAP model predictions. 



Oct 31  Thu  Michael SalterTownshend (University College Dublin)  Statistics Seminar  
14:00  Modelling Multiple Social Relations  
Hicks Seminar Room J11  
Abstract: Social network analysis is the rapidly expanding field that deals with interactions between individuals or groups. The literature has tended to focus on single network views, i.e. networks comprised of a group of nodes with a single type of link between node pairs. However, nodes may interact in different ways with the same alters. For example, on twitter one user may retweet, follow, list or message another user. There are thus 4 separate networks to consider. Current approaches include examining all network views independently or aggregating the different views to a single super network. Neither of these approaches are satisfying as the interaction between relationship types across network views is not explored. We are motivated by an example consisting of the census of 75 villages in the Karnataka province in India. The data was collated for use by a microfinance company and 12 different link types are recorded. We develop a novel method for joint modelling of multiview networks as follows; we begin with the popular latent space model for social networks and then extend the model to multiview networks through the addition of a matrix of interaction terms. The theory behind this extension is due to emerging work on Multivariate Bernoulli models. We first present the theory behind our new model. We then explore the relationship between the interaction terms and the correlation of the links across network views and finally we present results for the Karnataka dataset. Inference is a challenge and we adopt the NoUTurn sampler, a variant of Hamiltonian Monte Carlo for Bayesian inference. 



Nov 7  Thu  Peter Neal (Lancaster)  Statistics Seminar  
14:00  MCMC for a birthdeathmutation (BDM) model  
Hicks Seminar Room J11  
Abstract: A birthdeathmutation (BDM) model has been used by a number of authors to model the evolution of a tuberculosis epidemic in San Francisco in the early 1990s. The observed data is assumed to be a crosssectional study of the tuberculosis outbreak. It is impossible to write down the likelihood for the model without substantial, nontrivial data augmentation which prohibits the use of standard MCMC algorithms. However it is trivial to simulate a realisation of the BDM model and ABC algorithms have been used to estimate the parameters of the BDM model. Starting from the ABC perspective that simulation is straightforward, we construct an MCMC algorithm which uses simulation. Specifically we use a noncentered parameterisation which enables us to treat the simulation process as a data augmentation problem and takes similar amounts of time per iteration as the ABC algorithms. The MCMC algorithm is successfully applied to the San Francisco tuberculosis data. 



Nov 21  Thu  Axel Finke (Warwick)  Statistics Seminar  
14:00  Staticparameter estimation in piecewise deterministic processes using particle Gibbs samplers  
Hicks Seminar Room J11  
Abstract: We give a brief introduction to recent advances in sequential Monte Carlo and pseudomarginal MCMC methods as well as to piecewise deterministic processes (PDPs). The latter form a class of stochastic processes that jump randomly at a countable number of stopping times but otherwise evolve deterministically in continuous time. We then develop a particle Gibbs sampler for staticparameter estimation in PDPs that are observed only partially, noisily and in discrete time. We present a reformulation of the original particle filter for PDPs. This permits the use of a variancereduction technique known as ancestor sampling that greatly improves mixing of the particle Gibbs chain. We compare our method with a particle Gibbs sampler based on the variable rate particle filter. Our approach is further illustrated on a shotnoiseCoxprocess model that has applications in finance. This is joint work with Adam Johansen and Dario Spanò. 



Nov 28  Thu  Marton Balazs (Bristol)  Statistics Seminar  
14:00  Anomalous fluctuations in one dimensional interacting systems  
Hicks Seminar Room J11  
Abstract: I will describe a family of one dimensional interacting particle systems that contains the simple exclusion and the zero range processes, and many more. In the stationary distribution the current fluctuations show anomalous scalings, I will sketch parts of the proof of this phenomenon for some of our models. Meanwhile I will try to make it clear how convexity of a function of central importance leads to such unusual behaviour. The technical point that prevents us from proving anomalous scaling in great generality will also be pointed out. Our methods work with probabilistic arguments and couplings, hence it might give more intuition than alternative existing techniques of heavy combinatorics and analysis. 



Dec 5  Thu  Keith Harris (Sheffield)  Statistics Seminar  
14:00  Bayesian hierarchical models for microbial metagenomics  
Hicks Seminar Room J11  
Abstract: In this talk, we will introduce Dirichlet multinomial mixtures (DMM) for the probabilistic modelling of microbial metagenomics data. This data can be represented as a frequency matrix giving the number of times each taxa is observed in each sample. The samples have different size, and the matrix is sparse, as communities are diverse and skewed to rare taxa. Most methods used previously to classify or cluster samples have ignored all these features. The Dirichlet mixture components cluster communities into distinct ‘metacommunities’, and, hence, determine envirotypes or enterotypes, groups of communities with a similar composition. We applied the DMM model to human gut microbe genera frequencies from Obese and Lean twins. Our results suggested that obesity is not associated with a distinct microbiota but instead increases the chance that an individual derives from a disturbed enterotype. We will also show how the Dirichlet multinomial framework for defining enterotypes can be adapted to develop a Bayesian approximation to the Unified Neutral Theory of Biodiversity in ecology, which has been proposed as a null model for the structure of microbial communities. The approximation was developed as the existing maximum likelihood based genealogical approach for fitting the multisite UNTB is too computationally demanding for the large datasets typically encountered in microbiomics. The key to our strategy is the observation that the UNTB is, in the limit of large population sizes, equivalent to the hierarchical Dirichlet process (HDP) in statistics, which can be exploited to derive an efficient Gibbs sampler for the neutral model. We firstly validated this method by applying it to synthetic data and twentynine tropical tree plots from Panama that had already been shown to satisfy the neutral model. We then used it to determine the extent to which gut microbial communities are neutrally assembled. 



Dec 12  Thu  Rocio Campos (Sheffield)  Statistics Seminar  
14:00  Statistical approach to systems biology and human nutrition: building a novel biological network around metabolic programming of health outcomes influenced by nutrients during lactation  
Hicks Seminar Room J11  
Abstract: Human milk contains a host of bioactive factors including hormones, growth factors, neuropeptides, antiinflammatory and immunomodulatory components, as well as multiple nutrients as minerals, vitamins, amino acids and fatty acids. In addition, milk contains known and unknown molecules with important metabolic regulatory functions. Basic milk composition has already been established in the 60s, but this knowledge can be improved thanks to novel analytical techniques and systems biology approaches. Now we propose a nutrigenomicbased characterization of milk composition in order to get a full comprehensive view of milk characteristics and its role in infant growth. Moreover, the recent finding of microRNAs, with gene regulatory functions, in human milk is one of the key points that will be studied in this project. Therefore, this proposal intends to define relationships between molecular milk components and the potential influence of maternal diet on both milk composition and infant growth. Specifically, we are going to focus on the first two years of life and try to define (according to experimental models already developed in our home groups) potential adulthood predisposition to metabolic diseases, in particular on obesity. 



Dec 12  Thu  Martin Legarreta  Statistics Seminar  
14:00  Mapping of badger territories from field data  
Hicks Seminar Room J11  
Abstract: European badgers are animals that defend their territories not only with direct aggression but also through the use of detectable signs such as latrines. The aim of the research is to reconstruct maps of badger territories from data collected through baitmarking, where plastic markers placed in bait have been recovered after excretion and the spatial locations of latrines recorded. Latrines can be classified into three types: hinterland, boundary and outliers i.e. those from extraterritorial excursions. We have developed a Conditional Outlier Prediction Model which uses logistic regression to estimate the probability that a latrine is an outlier, based on its location, the types of other latrines in the same direction and other covariate information. This research extends previous work by estimating joint probabilities that multiple latrines are outliers and, combined with the Minimum Convex Polygon method, allowing the reconstruction of boundaries and quantifying the uncertainty in the reconstruction of a territory. 



Feb 13  Thu  Partha Dey (Warwick)  Statistics Seminar  
14:00  Multiple phase transitions in longrange firstpassage percolation on square lattices.  
Hicks Seminar Room J11  
Abstract: We consider a model of longrange firstpassage percolation on the ddimensional square lattice in which any two distinct vertices x, y are connected by an edge having exponentially distributed passage time with mean $xy^s$, where $s>0$ is a fixed parameter and $.$ is the $l_1$norm on $Z^d$. We analyze the asymptotic growth rate of the set $B_t$, which consists of all $x \in Z^d$ such that the firstpassage time between the origin 0 and $x$ is at most $t$, as $t\to\infty$. We show that depending on the values of $s$ there are four growth regimes:




Feb 20  Thu  Mohamed Shakandli (Sheffield)  Statistics Seminar  
14:00  Particle filtering applied to medical time series  
Hicks Seminar Room J11  
Abstract: This talk concerns the setup and application of particle filtering to medical time series. Considering count time series (such as number of asthma patients recorded over time) we discuss and propose nonlinear and nonGaussian state space models, in particular dynamic generalized linear models (DGLMs). Inference and forecasting is achieved by employing sequential Monte Carlo methods, also known as particle filters. These are simulation based methods that can be used for tracking and forecasting dynamical systems subject to both process and observation noise in nonlinear and nonGaussian models. 



Mar 6  Thu  Penny Watson (ScHaRR  University of Sheffield)  Statistics Seminar  
14:00  The Use of Health Economic Methods in the Development of New Interventions for Systemic Lupus Erythematosus  
Hicks Seminar Room J11  
Abstract: I aim to evaluate alternative trial designs for a new intervention for systemic lupus erythematosus (SLE) from the perspective of a pharmaceutical company. The costeffectiveness of new treatments for SLE can be evaluated in a costeffectiveness simulation describing individual patient disease pathways and the costs and health outcomes associated with them. The CE model for SLE included using SLE registry data to describe longterm outcomes, and simulated Phase II trial outcomes to describe treatment efficacy. I developed a Bayesian Clinical Trial Simulation for a Phase III SLE trial to evaluate the value of trials with alternative design characteristics. I describe an analytic method to compare SLE Phase III RCTs with variable sample size and duration of followup. The BCTS was used to simulate trial datasets given a particular design specification. The trial data were combined with prior parameters of the CE model to estimate posterior densities for the CE model inputs and update the outcomes of the CE model. Initially Bayesian updating was completed using a traditional calculation using Markov Chain Monte Carlo Simulation (MCMC) in WinBUGS. However, this method would take years to generate results. An approximation method was used to speed up analysis time. I will present the outcomes of the analysis from 1,600 BCTS iterations and discuss the limitations of value of information analyses for complex diseases. 



Mar 13  Thu  Jeremy Oakley (Sheffield)  Statistics Seminar  
14:00  Bayesian calibration for computer models using likelihood emulation  
Hicks Seminar Room J11  
Abstract: I will start by giving a short overview of the field of "Uncertainty Quantification": a variety of problems related to uncertainty in mathematical models of physical systems. I will then present some recent work (in collaboration with Ben Youngman) on calibration: finding model inputs such that the model outputs fit physical observations. Our approach is motivated by a case study involving a natural history model for colorectal cancer patients. The model is stochastic and computationally expensive, which inhibits evaluation of the likelihood function. We use a history matching approach, where we first exclude regions of input space where we can easily identify poor fits. We then construct an "emulator" (a fast statistical approximation) of the likelihood, which is used within importance sampling to sample from the posterior distribution of the computer model inputs. 



Mar 20  Thu  Marina Knight (York)  Statistics Seminar  
14:00  
Hicks Seminar Room J11  


Apr 3  Thu  Tom Stafford (Psychology, University of Sheffield)  Statistics Seminar  
14:00  Measuring the learning curve (n= 854,064)  
Hicks Seminar Room J11  
Abstract: I will present the results of a study of learning in players of a simple online game. In contrast to the highprecision experimental tasks which are common in experimental psychology, this study leverages the statistical power gained by having a study population of 854,064 people. Use of game data allowed us to connect, for the first time, rich details of training history with measures of performance from participants engaged for a sustained amount of time in effortful practice. We showed that lawful relations exist between practice amount and subsequent performance, and between practice spacing and subsequent performance. Our methodology allowed an in situ confirmation of results long established in the experimental literature on skill acquisition. Additionally, we showed that greater initial variation in performance is linked to higher subsequent performance, a result we link to the exploration/exploitation tradeoff from the computational framework of reinforcement learning. All the raw data and analysis code is available online, an example of "open science". Stafford, T. & Dewar, M. (2014). Tracing the trajectory of skill learning with a very large sample of online game players. Psychological Science, 25(2) 511–518. http://pss.sagepub.com/content/25/2/511 Data and analysis code. https://github.com/tomstafford/axongame 



May 8  Thu  Chris Jackson (MRC Cambridge)  Statistics Seminar  
14:00  Comparing structures of statetransition models for disease progression  
Hicks Seminar Room J11  
Abstract: Stochastic processes representing transitions between discrete states are often used to represent disease progression. Markov models are typical, and they may evolve in either discrete or continuous time. I will discuss the choice between models with different statetransition structures. The models will have some features in common, so that they can be used for the same purpose, such as estimating expected survival. For example, two adjacent states of disease severity could either be merged or separated, and we want to know which gives better estimates of survival. However, if the models are estimated from data at different levels of aggregation, standard likelihoodbased model comparison methods do not apply since the likelihoods are on different scales. In one common situation, the transition probabilites or rates are estimated from a single longitudinal dataset consisting of observations of the states of a number of individuals over time. In this case, a modification of AIC or crossvalidation can be used to compare the predictive ability of different models assessed on the data which they have in common. In the models used in health economic evaluations, however, the transition probabilities can typically only be estimated from data aggregated over individuals, or indirect data. In this case, models with split and merged states can often be compared by defining constraints on the parameters in the larger model. This produces a proxy for the merged model that can be compared against the larger one using standard methods. I will give examples from estimating the progression of healthrelated quality of life in psoriatic arthritis, and a health economic model for diagnostic tests for coronary artery disease. 



May 15  Thu  Student seminar  Sujunya and Joe (Sheffield)  Statistics Seminar  
14:00  Joe: Reconstructing the timescale of an icecore
Sujunya: Bayesian Semisupervised Classification for Satellite Imagery 

Hicks Seminar Room J11  
Abstract: Joe's abstract: The concentrations of various chemicals, particles and gases in icecores hold a continuous record of climatic and environmental information dating back hundreds of thousands of years. These data are recorded as a depth series and in order to meaningfully interpret them we must first learn about their underlying, unobserved timescale. We present a fully Bayesian bivariate approach to obtaining a marginal posterior distribution for the time of year, as well as the date, at any given depth. Sujunya's abstract: The aim of our research is to develop a Bayesian classification model for combining the two data sources of multispectral satellite images and field survey data. It is motivated by a practical problem of remote sensing studies when we have a very small labelled sample from a ground survey and a substantial number of unlabelledclass pixels from satellite images. This problem can be solved according to a semisupervised framework. We construct a semisupervised model with mixture distributions as an incompletedata problem, of which the unlabelled data are unknown classes. Then, we produce the Bayesian semisupervised procedure by using twostep Gibbs sampling. To evaluate the proposed model, the experimental results of the real satellite images and the simulated data had been compared with the existing techniques, the ML supervised decision rule and the semisupervised classification based on the EM algorithm. The numerical investigation has shown the benefits and the limitations of using the unlabelled data. In conclusion, I will discuss the strength and the weakness of semisupervised techniques. 



May 22  Thu  Enrico Scalas, Tusheng Zhang (Sussex)  Statistics Seminar  
14:00  Enrico Scalas: On the compound fractional Poisson process
Tusheng Zhang: Strong Convergence of WongZakai Approximations of Reflected SDEs in A Multidimensional General Domain 

LT4  
Abstract: Enrico Scalas: The compound fractional Poisson process (CFPP) is a random walk subordinated to a fractional Poisson process (FPP). The latter is a simple generalisation of the Poisson process where waiting times between events follow a Mittag–Leffler distribution. Several results on both CFPP and FPP will be presented related to applications in different fields of science. Tusheng Zhang: In this paper, we obtain the strong convergence of WongZakai approximations of reflected stochastic differential equations in a general multidimensional domain giving an affirmative answer to a question posed by Evans and Stroock in their recent paper. 



Oct 2  Thu  Chris Farmer (Oxford)  Statistics Seminar  
14:00  Ensemble Variational Filters for Sequential Inverse Problems  
LT7  
Abstract: Given a model dynamical system, a model of any measuring instrument relating states to measurements, and a prior assessment of uncertainty, the probability density of subsequent system states, conditioned upon the history of the measurements, is of some practical interest. When measurements are made at discrete times, it is known that the evolving probability density is a solution of the discrete Bayesian filtering equations. This talk describes the difficulties in approximating the evolving probability density using a Gaussian mixture (i.e. a sum of Gaussian densities). In general this leads to a sequence of optimisation problems and highdimensional integrals. Attention is given to the necessity of using a small number of densities in the mixture, the requirement to maintain sparsity of any matrices and the need to compute first and second derivatives of the misfit between predictions and measurements. Adjoint methods, Taylor expansions, Gaussian random fields and Newton's method can be combined to, possibly, provide a solution. 



Oct 9  Thu  Peter Young (Lancaster)  Statistics Seminar  
14:00  Refined Instrumental Variable Estimation: Maximum Likelihood Optimization of a Unified BoxJenkins Model  
LT7  
Abstract: For many years, various methods for the identification and estimation of parameters in linear, discretetime transfer function models have been available and implemented in widely available software environments, such as Matlab. This seminar considers a unified Refined Instrumental Variable (RIV) approach to the estimation of discrete and continuoustime transfer functions characterized by a unified operator that can be interpreted in terms of backward shift, derivative or delta operators. The paper shows that the resulting iterative RIV algorithm provides a reliable solution to the maximum likelihood optimization equations for an appropriately unified BoxJenkins transfer function model and so its en bloc or recursive parameter estimates are optimal in maximum likelihood, prediction error minimization and instrumental variable terms. The backward shift and derivative operator versions of the algorithm are available as the RIVBJ and RIVCBJ routines in the freely available CAPTAIN Toolbox for Matlab and these have been used for DataBased Mechanistic (DBM) modelling (see e.g Young, 2011) in areas ranging from engineering though economics and ecology to the environment. The seminar will describe a recent application where the RIVCBJ routine is used to identify and estimate a differential equation model of the latest globally averaged climate data. P. C. Young (2011). Recursive Estimation and TimeSeries Analysis: An Introduction for the Student and Practitioner, SpringerVerlag, Berlin. 



Oct 23  Thu  Claudie Beaulieu (National Oceanography Centre Southampton)  Statistics Seminar  
14:00  Detecting abrupt changes in the Earth’s climate system  
LTD  
Abstract: The Earth’s climate system and ecosystems exhibit abrupt changes and thresholds, which are especially challenging socioeconomically due to the rapidity at which society has to adapt. Changepoint detection techniques provide a valuable tool for the detection of abrupt changes in the climate and ecosystems. In this talk, the usefulness of changepoint detection will be demonstrated through a range of applications. The possibility to anticipate abrupt changes will also be discussed. 



Nov 13  Thu  Joao Domingos Scalon (Department of Exact Sciences – Federal University of Lavras – Brazil)  Statistics Seminar  
14:00  Gibbs Point Processes for modelling spatial distribution of secondphase particles in composite materials  
LTD  
Abstract: Silicon carbide reinforced aluminium alloy composites are the typical candidates for engineering applications due to their enhanced mechanical properties over the corresponding aluminium alloys such as high strength and fatigue resistance. However, these mechanical properties can be highly sensitive to local variations in spatial distribution of reinforcement particles and, consequently, the analysis of such distribution is of prime importance in materials science. The aim of this seminar is to present Gibbs point processes as an intuitively appealing way for characterizing spatial patterns formed by the locations of secondphase particles in composite materials. 



Nov 20  Thu  Kamila Zychaluk (Liverpool)  Statistics Seminar  
14:00  Semiparametric models for coral reef dynamics  
LTD  
Abstract: There are many mathematical models for the dynamics of coral reefs. Typically, these models assume the functional relationships that are responsible for changes in the reef community but there is often little evidence on which to choose the functional relationships. Furthermore, the parameters of such models are difficult to estimate. Instead, we propose a statistical model based on many data but relatively few assumptions. We use a large database of repeated observations of the composition of coral communities to make predictions about the dynamics of reef composition. We use our model to estimate a regional dynamic equilibrium in reef composition. We have observations of the proportion of space occupied by three components (hard corals, macroalgae, and others). These observations were made in consecutive years at Caribbean, Kenyan and Great Barrier Reef sites. We assume that the state of the reef after one year follows a Dirichlet distribution with parameters dependent on the current state of the reef. These parameters are estimated using a local linear estimator with crossvalidation bandwidth. These estimates are then used in a transition equation to obtain the stationary distribution of reef composition. The stationary distributions for the Caribbean and Great Barrier reef appear very different, in accordance with biological knowledge. These stationary distributions correspond to the dynamic equilibria for the two regions, if conditions remain as they are now. In addition to making predictions, our semiparametric models provide a summary of the major features of reef dynamics, which more mechanistic models should be able to reproduce. Joint work with Matthew Spencer, Damian Clancy, John F. Bruno and Tim McClanahan 



Nov 27  Thu  Duncan Lee (Glasgow)  Statistics Seminar  
14:00  Cluster detection and risk estimation for spatiotemporal health data  
LTD  
Abstract: In epidemiological disease mapping one aims to estimate the spatiotemporal pattern in disease risk and identify highrisk clusters, allowing health interventions to be appropriately targeted. Bayesian spatiotemporal models are used to estimate smoothed risk surfaces, but this is contrary to the aim of identifying groups of areal units that exhibit elevated risks compared with their neighbours. Therefore, in this paper we propose a new Bayesian hierarchical modelling approach for simultaneously estimating disease risk and identifying highrisk clusters in space and time. Inference for this model is based on Markov chain Monte Carlo simulation, using the freely available R package CARBayesST that has been developed in conjunction with this paper. Our methodology is motivated by two case studies, the first of which assesses if there is a relationship between Public health Districts and colon cancer clusters in Georgia, while the second looks at the impact of the smoking ban in public places in England on cardiovascular disease clusters. 



Dec 4  Thu  John Moriarty (Manchester)  Statistics Seminar  
14:00  A solvable twodimensional degenerate singular stochastic control problem with non convex costs  
LTD  
Abstract: This optimisation problem is motivated by a storageconsumption model in an electricity market, and features a stochastic realvalued spot price modelled by Brownian motion. Although the possibility of negative prices makes the cost function neither convex nor concave, we show that the problem is nevertheless solvable and find analytical expressions for the value function, the optimal control and the boundaries of the action and inaction regions. Both boundaries may be interpreted as repelling, although interestingly the well known smooth fit condition holds at one boundary but not the other. 



Dec 11  Thu  Marina Knight (York)  Statistics Seminar  
14:00  Hurst exponent estimation for longmemory processes using wavelet lifting.  
LTD  
Abstract: Reliable estimation of longrange dependence (LRD) parameters, such as the Hurst exponent, is a well studied problem in the statistical literature. However, when the observed time series presents missingness or is naturally irregularly sampled, the current literature is sparse, with most approaches requiring heavy modifications. In this talk I shall present a technique for estimating the Hurst exponent of an LRD time series that naturally deals with the time domain irregularity. The method is based on a flexible wavelet transform built by means of the lifting scheme, and we shall demonstrate its performance. 



Dec 18  Thu  Vincent Bonhomme (Sheffield)  Statistics Seminar  
14:00  
LTD  


Feb 5  Thu  Matt Nunes (Lancaster)  Statistics Seminar  
14:00  Analysis of time series observed on networks  
Hicks Seminar Room J11  
Abstract: In this talk we consider analysis problems for time series that are observed at nodes of a large network structure. Such problems commonly appear in a vast array of fields, such as environmental time series observed at different spatial locations or measurements from computer system monitoring. The time series observed on the network might exhibit different characteristics such as nonstationary behaviour or strong correlation, and the nodal series evolve according to the inherent spatial structure. The new methodology we develop hinges on reducing dimensionality of the original data through a change of basis. The basis we propose is a second generation wavelet basis which operates on spatial structures. As such, the (large) observed data is distilled down to key information on a reduced network topology. We discuss the potential of this dimension reduction method for time series analysis tasks. This is joint work with Marina Knight (University of York) and Guy Nason (University of Bristol). 



Feb 26  Thu  Sayan Banerjee (University of Warwick)  Statistics Seminar  
14:00  Maximal couplings and geometry  
Hicks Seminar Room J11  
Abstract: Maximal couplings are couplings of Markov processes where the tail probabilities of the coupling time attain the total variation lower bound (Aldous bound) uniformly for all time. Markovian couplings are coupling strategies where neither process is allowed to look into the future of the other before making the next transition. These are easier to describe and play a fundamental role in many branches of probability and analysis. Hsu and Sturm proved that the reflection coupling of Brownian motion is the unique Markovian maximal coupling (MMC) of Brownian motions starting from two different points. Later, Kuwada proved that to have a MMC for Brownian motions on a Riemannian manifold, the manifold should have a reflection structure, and thus proved the first result connecting this purely probabilistic phenomenon (MMC) to the geometry of the underlying space. In this work, we investigate general elliptic diffusions on Riemannian manifolds, and show how the geometry (dimension of the isometry group and flows of isometries) plays a fundamental role in classifying the space and the generator of the diffusion for which an MMC exists. We also describe these diffusions in terms of Killing vector fields (generators of rigid motions on manifolds) and dilation vector fields around a point. This is joint work with W.S. Kendall. 



Mar 12  Thu  Kevin Wilson (Strathclyde)  Statistics Seminar  
14:00  Expert judgement informed reliability growth models and the allocation of reliability tasks  
Hicks Seminar Room J11  
Abstract: There are many mathematical models in the literature for how a system’s reliability grows during development as a result of the Test, Analyse and Fix (TAAF) cycle. Most are based on convenient parametric forms and are extensions of simple models such as Poisson Processes. Often we can find one of these parametric models which fits our data well. However, parameters in such models are typically not observable and so eliciting a subjective prior distribution, which is often desirable due to a lack of observed data, is a challenging task. Further, engineers can be rightly sceptical of models based on parameters with no physical interpretation. In this talk we present a model for a reliability growth programme developed with engineering experts in the aerospace industry. All of the model parameters can be elicited from observable quantities and so priors can be specified directly. The model is used to identify an optimal subset of reliability tasks from a large number based on targets for cost, time on test and system reliability. The optimal subset is identified by maximising the prior expectation of a multiattribute utility function. 



Mar 19  Thu  Gwilym Pryce (Sheffield Methods Institute)  Statistics Seminar  
14:00  Urban Inequalities in Exposure to Crime and the Impact on Education  
Hicks Seminar Room J11  
Abstract: This seminar will set out two statistical problems. First, how to measure crime exposure for each residential address in a city, and in particular, how to ascertain the optimal distance decay function for the crime exposure measure. Second, how to estimate the impact on school performance controlling for other factors. Both questions have important applications. Being able to measure crime exposure for individual address potentially overcomes the modifiable aerial unit problem associated with using averages for administrative areas, and allows us to better understand nuances in the spatial variation in crime and how these change over time. Developing robust measures of crime exposure is also the first step in enabling researchers better understand the true cost of crime in terms of the impact on of a variety of social factors including educational performance, health, wellbeing, house prices, and other life outcomes. The seminar will set out the main methodological challenges as the basis for discussion on how to best to design an appropriate research strategy. 



Apr 16  Thu  Nicos Georgiou (Sussex)  Statistics Seminar  
14:00  Geometric aspects of directed last passage percolation on the plane  
Hicks Seminar Room J11  
Abstract: In this talk we present the corner growth model an infection spreading in an orderly way through the sites in the first quadrant and explain certain geometric aspects of the infection spread. In particular, we are concerned with understanding the law of large numbers of the infection surface and the microscopic random infinite geodesics associated with the model. This talk is intended for a diverse audience. 



Apr 23  Thu  Student seminar: Christian Fonseca Mora and Jian Wang (Sheffield)  Statistics Seminar  
14:00  Christian: Stochastic partial differential equations with Lévy noise in some
infinite dimensional spaces
Jian: Multivariate Stochastic Volatility Estimation using Particle Filters 

Hicks Seminar Room J11  
Abstract: Christian: In this talk we consider stochastic evolution equations driven by Lévy noise in some infinite dimensional spaces. Such equations are important from a theoretical point of view and also because they have a wide range of applications. The spaces in which this equations take values are called duals of nuclear spaces and play an important role in different areas of mathematics as partial differential equations, harmonic analysis and probability in infinite dimensional spaces. The talk is intended to be an introduction to the subject and to the main results that we have obtained so far. Jian: This presentation considers a modelling framework for multivariate volatility in financial time series. The talk will briefly review particle filtering or sequential Monte Carlo methods. An overview of the multivariate volatility modelling literature will be given. As most financial returns exhibit heavy tails and skewness, we are considering a model for the returns based on the skewt distribution, while the volatility is assumed to follow a Wishart autoregressive process. We define a new type of Wishart autoregressive process and highlight some of its properties and some of its advantages. Particle filter based inference for this model is discussed and a novel approach of estimating static parameters is provided. Furthermore, an alternative for estimating the higher dimension data will be given. The proposed methodology is illustrated with two data sets consisting of asset returns of the FTSE100 stock exchange and the current exchange rate. 



Apr 30  Thu  Janine Illian (University of St Andrews, St Andrews, UK and NTNU Trondheim, Norway)  Statistics Seminar  
14:00  Developing complex spatial models for the real world – a multidisciplinary symbiosis  
Hicks Seminar Room J11  
Abstract: Strongly motivated by interdisciplinary research substantial advances have been made in the development of practically relevant, spatial statistical methodology. In the context of spatial point process models, this has been the case, in particular, for log Gaussian Cox processes. Facilitated by the recent development of efficient and very accurate approximation methods for fitting models based on spatial random fields it has become possible to develop and apply flexible and realistically complex spatial models without prohibitive computational cost (Rue et al. 2009; Lindgren et al. 2011, Illian et al. 2012a and b). The R library RINLA has been instrumental in making these methods available to nonspecialist users and promote their usage in practice. This talk outlines the mutual benefits of developing both methodology and software as part of a continuing dialogue between method developers and ecologists. Highlights of this symbiosis and recent developments resulting from it are presented. We illustrate these with a number of applications from ecology and beyond. 



May 7  Thu  Daniel Williamson (Exeter)  Statistics Seminar  
14:00  Posterior belief assessment: extracting meaningful subjective judgements from Bayesian analyses with complex statistical models  
Hicks Seminar Room J11  
Abstract: In a Bayesian analysis of any reasonable complexity, many, if not all of the prior and likelihood judgements we specify in order to make progress are not believed (or owned) by either analyst or subject expert. In what sense then, should we be able to attribute meaning to a large sample from the posterior distribution? Foundationally, is the posterior distribution a probability distribution at all and, if not, what is it and what can it be used for? In this talk I will present a methodology for extracting judgements for key quantities from a large Bayesian analysis. We call this Posterior Belief assessment and it is based on the idea that there are many other Bayesian analyses that you might have performed (where, for example, you used different prior/model forms for subcomponents of the statistical model). We impose forms of exchangeability and coexchangeability over key derived posterior quantities under each of these theoretical Bayesian analyses and use these, a handful of alternative analyses and temporal sure preference to derive posterior judgements that we show are closer to what de Finetti termed prevision than the corresponding judgements from your original analysis. We argue that posterior belief assessment is a tractable and powerful alternative to robust Bayesian analysis and illustrate with an example of calibrating an expensive ocean model in order to quantify uncertainty about global mean temperature in the real ocean. 



May 14  Thu  Zdzislaw Brzezniak (York)  Statistics Seminar  
14:00  Strong and weak solutions to stochastic LandauLifshitz equations  
Hicks Seminar Room J11  
Abstract: I will speak about the existence of weak solutions (and the existence and uniqueness of strong solutions) to the stochastic LandauLifshitz equations for multi (and one)dimensional spatial domains. I will also describe the corresponding Large Deviations principle and it's applications to a ferromagnetic wire. The talk is based on a joint works with B. Goldys and T. Jegaraj. 



Sep 24  Thu  Nic Freeman (Sheffield)  Statistics Seminar  
14:15  Cluster growth in a forest fire model.  
LT7  
Abstract: I will discuss the limiting behaviour of a mean field forest fire model as the size of the model tends to infinity. The model is closely related to the dynamical Erdős–Rényi random graph. We study a particular regime in which the model displays selforganized criticality and produces clusters of heavy tailed size. 



Sep 24  Thu  Remco van Hofstad (Eindhoven )  Statistics Seminar  
15:45  Competition and diffusion in random graphs (The 2015 Applied Probability Trust Lecture)  
LT7  
Abstract: Empirical findings have shown that many realworld networks share fascinating features. Indeed, many realworld networks are small worlds, in the sense that typical distances are much smaller than the size of the network. Further, many realworld networks are scalefree in the sense that there is a high variability in the number of connections of the elements of the networks, making these networks highly inhomogeneous. Such networks are typically modeled using random graphs with powerlaw degree sequences. In this lecture, we will investigate the behavior of competition processes on scalefree random graphs with finitemean, but infinitevariance degrees. Take two vertices uniformly at random, or at either side of an edge chosen uniformly at random, and place an individual of two distinct types at these two vertices. Equip the edges with traversal times, which could be different for the two types. Then let each of the two types invade the graph, such that any other vertex can only be occupied by the types that gets there first. Let the speed of the types be the inverse of the expected traversal times of an edge by that types. We distinguish two cases. When the traversal times are exponential, we see that one (not necessarily the faster) types will occupy almost all vertices, while the losing types only occupied a bounded number of vertices. This is reflected in the ABBA lyrics ``The winner takes it all, the loser's standing small''. In particular, no asymptotic coexistence can occur. Work in progress investigates whether this occurs more generally. On the other hand, for deterministic traversal times, the fastest types always gets the majority of the vertices, while the other occupies a subpolynomial number. When the speeds are the same, asymptotic coexistence (in the sense that both types occupy a positive proportion of the vertices) occurs with positive probability. This lecture is based on joint work with Mia Deijfen, Julia Komjathy and Enrico Baroni, and builds on earlier work with Gerard Hooghiemstra, Shankar Bhamidi and Dmitri Znamenski. 



Nov 26  Thu  Francisco Alejandro Díaz De la O (Liverpool)  Statistics Seminar  
14:00  Subset Simulation for Bayesian Updating and Model Selection  
Lecture Theatre B  
Abstract: On the one hand, the problems of model updating and model selection can be tackled using a Bayesian approach: the model parameters to be identified are treated as uncertain and the inference is done in terms of their posterior distribution. On the other hand, the engineering structural reliability problem can be solved by advanced Monte Carlo simulation techniques such as Subset Simulation. Recently, a formulation that connects the Bayesian updating problem and the structural reliability problem has been established. This opens up the possibility of efficient model calibration and model selection using Subset Simulation. The formulation, called BUS (Bayesian Updating with Structural reliability methods), is based on a rejection principle. Its theoretical correctness and efficiency requires the prudent choice of a multiplier, which has remained an open question. Motivated by this problem, this talk presents a study of BUS. The discussion will lead to a revised formulation that allows Subset Simulation to be used for Bayesian updating and model selection without having to choose a multiplier in advance. 



Feb 12  Fri  Simon Tavare (Cambridge)  Statistics Seminar  
14:00  How often does a random mapping have distinct component sizes?  
Hicks Seminar Room J11  
Abstract: One of the classical results about a random permutation of $[n] = \{1,2, \ldots,n\}$ is that the probability it has distinct cycle lengths is asymptotically $\exp(\gamma) \approx 0.561$; here $\gamma$ is Euler’s constant. In this talk I will discuss the analogous problem for a broad class of random decomposable combinatorial structures that includes random mappings. I will illustrate how discrete process approximations can be used to answer the question in the title, and many related problems, in a very simple way. As a byproduct I will describe some interesting methods for simulating the component count process of these structures. 



Feb 18  Thu  Joakim Beck (UCL)  Statistics Seminar  
14:00  
Hicks Seminar Room J11  


Mar 3  Thu  Andrew Golightly (Newcastle)  Statistics Seminar  
14:00  
Hicks Seminar Room J11  


Mar 17  Thu  Jim Griffin (Kent)  Statistics Seminar  
14:00  Adaptive MCMC schemes for variable selection problems Coauthors: Krys Latuszynski and Mark Steel  
Hicks, Lecture theatre C  
Abstract: Data set with many variables (often, in the hundreds, thousands, or more) are routinely collected in many disciplines. This has lead to interest in variable selection in regression models with a large number of variables. A standard Bayesian approach defines a prior on the model space and uses Markov chain Monte Carlo methods to sample the posterior. Unfortunately, the size of the space (2^p if there are p potential variables) and the use of simple proposals in MetropolisHastings steps has lead to samplers that mix poorly over models. In this talk, I will describe two adaptive MetropolisHastings scheme which adapts an independence proposals to the posterior distribution. This leads to substantial improvements in the mixing over standard algorithms in large data sets. The methods will be illustrated on simulated and real data with with hundreds or thousands of possible variables. 



Apr 21  Thu  Ruth King (Edinburgh)  Statistics Seminar  
14:00  
Hicks Seminar Room J11  


May 5  Thu  Pete Dodd (Sheffield)  Statistics Seminar  
14:00  
Hicks Seminar Room J11  


May 19  Thu  Dler Kadir and Abdulaziz Alenazi (Sheffield)  Statistics Seminar  
14:00  Dler: Markov chain Monte Carlo estimation for autoregressive time series Abdelaziz: A fully Bayesian differentialshrinkage approach to incorporating functional genomic information into casecontrol fine mapping studies  
Hicks, Lecture Theatre C  
Abstract: Dler: The purpose of this talk is to discuss Markov chain Monte Carlo estimation (MCMC) for stationary autoregressive time series. In order to do this, we need to derive the stationary conditions to put priors for estimating parameters of autoregressive models. Therefore, first, we study stationary conditions because the stationary affects what priors we will set up in a Bayesian setting. Next, we will apply MCMC in order to estimate parameters based on mentioned priors. Our interest is focused on the autoregressive model of order p (AR(p)) and the development and utility of Bayesian inference. One of the major obstacles in setting up a Bayesian estimation procedure for autoregressive models is the assumption of stationarity. In our view it is the reason why Bayesian estimation for such models is relatively limited. In this talk the stationary conditions of AR(2) to AR(3) are revisited. We show that for the most general model AR(p) one can achieve sufficient stationary conditions, consisting of a set of linear inequalities. This can then be exploited to set up a Metropolis within Gibbs simulation scheme. We discuss in some detail the problem in the case of AR(3) and we propose a second MCMC scheme for the AR(3) model. Throughout, we use simulated data to illustrate the proposed methodology. Abdelaziz: Bayesian approaches are particularly useful in fine mapping casecontrol studies as they naturally allow the inclusion of prior information relating to functional significance. We use the normalgamma (NG) prior proposed by Griffin and Brown and modify it to allow the inclusion of function information in the form of published functional significance scores. These scores assimilate functional information from many online sources and combine them into a single score. Rather than use the correct logistic likelihood for the response which is computationally more demanding, we use the asymptotic Gaussian distribution for our maximum likelihood estimate of the model coefficients (log odds ratios). This enables us to speed up our MCMC analysis by using the Gaussian linear model framework. The NG prior assumes a hierarchical form for the coefficients which is similar to the normalexponentialgamma prior used in Hyperlasso but allows more flexibility in the shrinkage imposed by the prior. We calibrate the NG hyperparameters using published top hits from large breast cancer genome wide association studies. We allow the functional significance scores to alter the prior probability density function of the log odds ratio on a SNP by SNP basis and show how this can be used to improve the detection of causal variants. We show by using simulated casecontrol data that our modified NG prior can give higher true positive rates at relevant low false positive rates compared to logistic regression, piMASS, HyperLASSO and the standard NG prior. 



Oct 26  Wed  Professor Stephen Senn (Competence Center for Methodology and Statistics, CRPSante)  Statistics Seminar  
17:15  Numbers needed to mislead, metaanalysis and muddled thinking  
Lecture Theatre 4, The Diamond  
Abstract: The ardent espousal by the evidence based medicine movement of numbers needed to treat (NNT) as a way of making difficult statistical concepts simple and concrete, has has the unintended consequence of sowing confusion. Many users, including many in the evidence based movement themselves, have interpreted these statistics as indicating what proportion of patients benefit from treatment. However, they cannot deliver this information. I shall explain this, with the example of a recent Cochrane Collaboration metaanalysis of paracetamol against placebo in trials of tension headache for which the plain language summary claimed: The outcome of being pain free or having only mild pain at two hours was reported by 59 in 100 people taking paracetamol 1000 mg, and in 49 out of 100 people taking placebo (high quality evidence), meaning that only 10 in 100 people benefited because of paracetamol 1000 mg. With the aid of a simple model also illustrated (just for fun) by a simulation, I shall show that the plain language conclusion is plain wrong. The observed facts do not necessarily mean that only 10 in 100 people benefited. The combination of arbitrary dichotomies and NNTs has a dangerous ability to deceive and may be leading us to expect much more of personalised medicine than it can deliver. All welcome. Admission to the lecture is free, but registration is required. 



Oct 27  Thu  Alison Parton (Sheffield, SoMaS)  Statistics Seminar  
14:00  A hybrid MCMC sampler for inferring animal movements and behaviours from GPS observations  
F20  
Abstract: Although animal locations gained via GPS, etc. are typically observed on a discrete time scale, movement models formulated in continuous time are preferable; avoiding the struggles experienced in discrete time when faced with irregular observations or the prospect of comparing analyses on different time scales. A class of models able to emulate a range of movement ideas are defined by representing movement as a combination of stochastic processes describing both speed and bearing. This framework can then be extended to allow multiple behavioural modes through a continuous time Markov process. Bayesian inference for such models is described through the use of a hybrid MCMC approach. Such inference relies on an augmentation of the animal’s locations in discrete time, with a more detailed movement path gained via simulation techniques. Simulated and real data on an individual reindeer (Rangifer tarandus) will illustrate the presented methods. 



Nov 3  Thu  Heiko Strathmann (Gatsby, UCL)  Statistics Seminar  
14:00  
Hicks Seminar Room J11  


Nov 10  Thu  Dr David Wyncoll (HR Wallingford)  Statistics Seminar  
14:00  Nationalscale multivariate extreme value analysis for coastal flood risk analysis  
Hicks Seminar Room J11  
Abstract: Coastal flooding in the UK is driven by the joint occurrence of large waves, winds and sea levels. In order to quantify the flood risk at a single site it is important to study the dependence between these variables in extreme values. The spatial dependence between coastal locations is also important for quantifying the likelihood of single largescale coastal flooding events. We present a nationalscale multivariate extreme value analysis of offshore drivers of coastal flooding in England and Wales. This appropriately captures dependences between both extreme and nonextreme driving variables at and between multiple coastal locations. The output of this analysis is a large Monte Carlo sample of plausible joint events that may be propagated though a chain of emulated numerical models to estimate the risk of largescale coastal flooding. 



Nov 10  Thu  Sajni Malde ( HR Wallingford)  Statistics Seminar  
15:30  
Hicks Seminar Room J11  


Dec 8  Thu  Stefano Castruccio (Newcastle)  Statistics Seminar  
14:00  Global SpaceTime Emulators for Ensemble of Opportunities: Assessing Scenario Uncertainty for CMIP5  
Hicks Seminar Room J11  
Abstract: Simulating Earth System Models (ESMs) is among the most challenging exercises of contemporary science. ESMs require an extremely highdimensional input comprising of a value of the forcing scenario for each year, and produce an even higher dimensional output in space, time and variables. Given the considerable computational and logistic challenges of performing even a small set of simulations, an ensemble comprises of a very limited number of runs. In the case of the CMIP5 ensemble, the reference for the latest IPCC assessment report, each modelling group submitted longterm simulations under at most four scenarios, thus providing very limited information for policy making. An emulator in scenario space can be developed to overcome these limitations. However, the modest number of runs, paired with the extremely large dimensionality of the input and output space, poses significant challenges in the development of the statistical methodology. In this talk, I will present a scenario emulator for ESMs that leverages on the temporal structure of the input/output space, on the causality principle and on the gridded geometry of the output. I will present an application to this methodology for temperature and wind data in the case of two ensembles, and I will show how the emulator provides accurate results for a dataset of tens of millions of data points. 



Dec 8  Thu  Finn Lindgren (Edinburgh)  Statistics Seminar  
15:30  EUSTACE: Latent Gaussian process models for weather and climate reconstruction  
Hicks Seminar Room J11  
Abstract: The EUSTACE project will give publicly available daily estimates of surface air temperature since 1850 across the globe for the first time by combining surface and satellite data using novel statistical techniques. To this end, a spatiotemporal multiscale statistical Gaussian random field model is constructed, using connections between SPDEs and Markov random fields to obtain sparse matrices for the practical computations. The extreme size of the problem necessitates the use of iterative solvers, making use of the multiscale structure of the model to design an effective preconditioner. 



Oct 12  Thu  Dino Sejdinovic (Oxford)  Statistics Seminar  
14:00  Approximate Kernel Embeddings and Symmetric Noise Invariance  
LT 9  
Abstract: Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric hypothesis testing and for learning on distributional inputs. I will give an overview of this framework and present some of the applications of the approximate kernel embeddings to Bayesian computation. Further, I will discuss a recent modification of MMD which aims to encode invariance to additive symmetric noise and leads to learning on distributions robust to the distributional covariate shift, e.g. where measurement noise on the training data differs from that on the testing data. https://arxiv.org/abs/1703.07596 



Oct 19  Thu  Mauricio Alvarez (Sheffield)  Statistics Seminar  
14:00  


Nov 9  Thu  Arthur Gretton (UCL)  Statistics Seminar  
14:00  


Nov 16  Thu  Timothy Waite (Manchester)  Statistics Seminar  
14:00  


Dec 7  Thu  Maria Kalli (Kent)  Statistics Seminar  
14:00  


Feb 15  Thu  Jeremy Colman (Sheffield)  Statistics Seminar  
15:00  Stan: better faster MCMC  A user review  
F41  


Apr 19  Thu  Martine Barrons (Warwick)  Statistics Seminar  
14:00  
LT3  


Feb 7  Thu  Jeremy Colman (Sheffield)  Statistics Seminar  
14:00  Accounting for Uncertainty in Estimates of Extremes  
LT E  
Abstract: Devastating consequences can flow from the failure of certain structures, such as coastal flood defences, nuclear installations, and oil rigs. Their design needs to be robust under rare (p < 0.0001) extreme conditions, but how can the designers use data typically from only a few decades to predict the size of an event that might occur once in 10,000 years? Extreme Value Theory claims to provide a sound basis for such faroutofsample prediction, and using Bayesian methods a full posterior distribution can be obtained. If the past data are supplemented by priors that take into account expert opinion, seemingly tight estimates result. Are such claims justified? Has all uncertainty been taken into account? My research is addressing these questions. 



Feb 21  Thu  Sophia Wright (Warwick)  Statistics Seminar  
14:00  Bayesian Networks, Total Variation and Robustness  
LT E  
Abstract: This talk explores the robustness of large Bayesian Networks when applied in decision support systems which have a prespecified subset of target variables. We develop new methodology, underpinned by the total variation distance, to determine whether simplifications which are currently employed in the practical implementation of such graphical systems are theoretically valid. This same process can identify areas of the system which should be prioritised if elicitation is required. This versatile framework enables us to study the effects of misspecification within a Bayesian network (BN), and also extend the methodology to quantify temporal effects within Dynamic BNs. Unlike current robustness analyses, our new technology can be applied throughout the construction of the BN model; enabling us to create tailored, bespoke models. For illustrative purposes we shall explore the field of Food Security within the UK. 



Feb 28  Thu  Wil Ward (Sheffield)  Statistics Seminar  
14:00  A Variational Approach to Approximating State Space Gaussian Processes  
LT E  
Abstract: The state space representation of a Gaussian process (GP) models the dynamics of an unknown (nonlinear) function as a whitenoise driven Itô differential equation. Representation in this form allows for the construction of joint models that mix known dynamics (e.g. population) with latent unknown input. Where these interactions are nonlinear, or observed through nonGaussian likelihoods, there is no exact solution and approximation techniques are required. This talk introduces an approach using black box variational inference to model surrogate samples and estimate the underlying parameters. The approximations are compared with full batch solutions and demonstrated to be indistinguishable in twosample tests. Software and implementation challenges will also be addressed. 



Mar 7  Thu  Christian Fonseca Mora (Costa Rica)  Statistics Seminar  
14:00  Stochastic PDEs in Infinite Dimensional Spaces  
LT E  
Abstract: In this talk we will give an introduction to SPDEs in spaces of distributions. In the first part of the talk we consider a model of environmental pollution with Poisson deposits that will help to introduce the basic concepts for the study of SPDEs on infinite dimensional spaces. In the second part of the talk, we introduce a generalized form of SPDEs in spaces of distributions and explain conditions for the existence and uniqueness of its solutions. For this talk we will not assume any previous knowledge on SPDEs. 



Mar 14  Thu  Jeremy Oakley (Sheffield)  Statistics Seminar  
14:00  Variational inference reading group  
LT E  
Abstract: We will be spending two seminar slots on the following: Variational Inference: A Review for Statisticians https://arxiv.org/abs/1601.00670 David M. Blei, Alp Kucukelbir, Jon D. McAuliffe 



Mar 21  Thu  Theo Kypraios (Nottingham)  Statistics Seminar  
14:00  Recent Advances in Identifying Transmission Routes of Healthcare Associated Infections using Whole Genome Sequence Data  
LT E  
Abstract: Healthcareassociated infections (HCAIs) remain a problem worldwide, and can cause severe illness and death. It is estimated that 510% of acutecare patients are affected by nosocomial infections in developed countries, with higher levels in developing countries. Statistical modelling has played a significant role in increasing understanding of HCAI transmission dynamics. For instance, many studies have investigated the dynamics of MRSA transmission in hospitals, estimating transmission rates and the effectiveness of various infection control measures. However, uncertainty about the true routes of transmission remains and that is reflected on the uncertainty of parameters governing transmission. Until recently, the collection of whole genome sequence (WGS) data for bacterial organisms has been prohibitively complex and expensive. However, technological advances and falling costs mean that DNA sequencing is becoming feasible on a larger scale. In this talk we first describe how to construct statistical models which incorporate WGS data with regular HCAIs surveillance data (admission/discharge dates etc) to describe the pathogen's transmission dynamics in a hospital ward. Then, we show how one can fit such models to data within a Bayesian framework accounting for unobserved colonisation times and imperfect screening sensitivity using efficient Markov Chain Monte Carlo algorithms. Finally, we illustrate the proposed methodology using MRSA surveillance data collected from a hospital in NorthEast Thailand. 



Mar 28  Thu  Jeremy Oakley (Sheffield)  Statistics Seminar  
14:00  Variational inference reading group  
LT E  
Abstract: We will be spending two seminar slots on the following: Variational Inference: A Review for Statisticians https://arxiv.org/abs/1601.00670 David M. Blei, Alp Kucukelbir, Jon D. McAuliffe 



Apr 2  Tue  Arne Grauer, Lukas Lüchtrath (Cologne)  Statistics Seminar  
16:00  The agedependent random connection model  
F28  
Abstract: We consider a class of growing graphs embedded into the $d$dimensional torus where new vertices arrive according to a Poisson process in time, are randomly placed in space and connect to existing vertices with a probability depending on time, their spatial distance and their relative ages. This simple model for a scalefree network is called the agebased spatial preferential attachment network and is based on the idea of preferential attachment with spatially induced clustering. The graphs converge weakly locally to a variant of the random connection model, which we call the agedependent random connection model. This is a natural infinite graph on a Poisson point process where points are marked by a uniformly distributed age and connected with a probability depending on their spatial distance and both ages. We use the limiting structure to investigate asymptotic degree distribution, clustering coefficients and typical edge lengths in the agebased spatial preferential attachment network. 



May 9  Thu  Rebecca Killick (Lancaster)  Statistics Seminar  
14:00  Computationally Efficient Multivariate Changepoint Detection with Subsets  
LT E  
Abstract: Historically much of the research on changepoint analysis has focused on the univariate setting. Due to the growing number of high dimensional datasets there is an increasing need for methods that can detect changepoints in multivariate time series. In this talk we focus on the problem of detecting changepoints where only a subset of the variables under observation undergo a change, so called subset multivariate changepoints. One approach to locating changepoints is to choose the segmentation that minimises a penalised cost function via a dynamic program. The work in this presentation is the first to create a dynamic program specifically for detecting changes in subsetmultivariate time series. The computational complexity of the dynamic program means it is infeasible even for medium datasets. Thus we propose a computationally efficient approximate dynamic program, SPOT. We demonstrate that SPOT always recovers a better segmentation, in terms of penalised cost, then other approaches which assume every variable changes. Furthermore under mild assumptions the computational cost of SPOT is linear in the number of data points. In small simulation studies we demonstrate that SPOT provides a good approximation to exact methods but is feasible for datasets that contain thousands of variables observed at millions of time points. Furthermore we demonstrate that our method compares favourably with other commonly used multivariate changepoint methods and achieves a substantial improvement in performance when compared with fully multivariate methods. 



May 16  Thu  Christopher Fallaize (Nottingham)  Statistics Seminar  
14:00  Unlabelled Shape Analysis with Applications in Bioinformatics  
LT E  
Abstract: In shape analysis, objects are often represented as configurations of points, known as landmarks. The case where the correspondence between landmarks on different objects is unknown is called unlabelled shape analysis. The alignment task is then to simultaneously identify the correspondence between landmarks and the transformation aligning the objects. In this talk, I will discuss the alignment of unlabelled shapes, and discuss two applications to problems in structural bioinformatics. The first is a problem in drug discovery, where the main objective is to find the shape information common to all, or subsets of, a set of active compounds. The approach taken resembles a form of clustering, which also gives estimates of the mean shapes of each cluster. The second application is the alignment of protein structures, which will also serve to illustrate how the modelling framework can incorporate very general information regarding the properties we would like alignments to have; in this case, expressed through the sequence order of the points (amino acids) of the proteins. 



Oct 10  Thu  Richard Glennie (St Andrews)  Statistics Seminar  
14:00  Modelling latent processes in population abundance surveys using hidden Markov models  
K14  
Abstract: Distance sampling and spatial capturerecapture are statistical methods to estimate the number of animals in a wild population based on encounters between these animals and scientific detectors. Both methods estimate the probability an animal is detected during a survey, but do not explicitly model animal movement and behaviour. The primary challenge is that animal movement in these surveys is unobserved; one must average over all possible histories of each individual. In this talk, a general statistical model, with distance sampling and spatial capturerecapture as special cases, is presented that explicitly incorporates animal movement. An algorithm to integrate over all possible movement paths, based on quadrature and hidden Markov modelling, is given to overcome common computational obstacles. For distance sampling, simulation studies and case studies show that incorporating animal movement can reduce the bias in estimated abundance found in conventional models and expand application of distance sampling to surveys that violate the assumption of no animal movement. For spatial capturerecapture, continuoustime encounter records are used to make detailed inference on where animals spend their time during the survey. In surveys conducted in discrete occasions, maximum likelihood models that allow for mobile activity centres are presented to account for transience, dispersal, and heterogeneous space use. These methods provide an alternative when animal movement causes bias in standard methods and the opportunity to gain richer inference on how animals move, where they spend their time, and how they interact. 



Oct 14  Mon  Jeremy Oakley (Sheffield)  Statistics Seminar  
13:00  Deep Learning reading group: Chapter 6 from Goodfellow et al. (2016)  
LT 6  
Abstract: Discussion of Chapter 6 from "Deep Learning", by Goodfellow, Bengio and Courville https://www.deeplearningbook.org/ 



Oct 15  Tue  Emma Gordon (Director of Administrative Data Research UK)  Statistics Seminar  
16:00  Royal Statistical Society (RSS) Sheffield Local group seminar.
The potential and pitfalls of linked administrative data 

LT B  
Abstract: Administrative databases that are linked with each other or with survey data can allow deeper insights into the population’s life trajectories and needs and signal opportunities for improved and ultimately more personalised service delivery. Yet government agencies have to meet several prerequisites to realise these benefits. First among them is a stable legal basis. Appropriate laws and regulations have to exist to allow data merging within the limits of existing privacy protection. When different institutions are involved, these regulations have to clearly define each agencies’ responsibilities in collecting, safeguarding and analysing data. Second are technical requirements. This includes creating a safe infrastructure for data storage and analysis and developing algorithms to match individuals when databases do not share common unique personal identifiers. Third is the buyin of the population. Public communication can highlight the valueadded of linked databases and outline the steps taken to ensure data security and privacy. Involving citizens in dialogues about what data uses they are and are not comfortable with can help build public trust that appropriate limits are set and respected. 



Oct 24  Thu  Lyudmila Mihaylova (Sheffield)  Statistics Seminar  
14:00  Nonparametric Methods and Models with Uncertainty Propagation  
LT E  
Abstract: We are experiencing an enormous growth and expansion of data provided by multiple sensors. The current monitoring and control systems face challenges both in processing big data and making decisions on the phenomena of interest at the same time. Urban systems are hugely affected. Hence, intelligent transport and surveillance systems need efficient methods for data fusion, tracking and prediction of individual vehicular traffic and aggregated flows. This talk will focus on two main methods able to solve such monitoring problems, by fusing multiple types of data while dealing with nonlinear phenomena – sequential Markov Chain Monte Carlo (SMCMC) methods with adaptive subsampling and Gaussian Process regression methods. The first part of this talk will present a SMCMC approach able to deal with massive data based on adaptively subsampling the sensor measurements. The main idea of the method to approximate the logarithm of the likelihood ratio by performing a tradeoff between complexity and accuracy. The approach efficiency will be demonstrated on object tracking tasks. Next, Gaussian Process methods will be presented – for point and extended object tracking, i.e. both in space and in time. Using the derivatives of the Gaussian Process leads to an efficient replacement of multiple models that usually are necessary to represent the whole range of behaviour of a dynamic system. These methods give the opportunity to assess the impact of uncertainties, e.g. from the sensor data on the developed solutions. 



Oct 28  Mon  Jeremy Oakley (Sheffield)  Statistics Seminar  
13:00  Deep Learning reading group: 6.57.2 from Goodfellow et al. (2016)  
LT 6  


Oct 31  Thu  Tom Hutchcroft (Cambridge)  Statistics Seminar  
14:00  Phase transitions in hyperbolic spaces  
LT E  
Abstract: Many questions in probability theory concern the way the geometry of a space influences the behaviour of random processes on that space, and in particular how the geometry of a space is affected by random perturbations. One of the simplest models of such a random perturbation is percolation, in which the edges of a graph are either deleted or retained independently at random with retention probability p. We are particularly interested in phase transitions, in which the geometry of the percolated subgraph undergoes a qualitative change as p is varied through some special value. Although percolation has traditionally been studied primarily in the context of Euclidean lattices, the behaviour of percolation in more exotic settings has recently attracted a great deal of attention. In this talk, I will discuss conjectures and results concerning percolation on the Cayley graphs of nonamenable groups and hyperbolic spaces, and give the main ideas behind our recent result that percolation in any transitive hyperbolic graph has a nontrivial phase in which there are infinitely many infinite clusters. The talk is intended to be accessible to a broad audience. 



Nov 7  Thu  Deborah Ashby (Imperial College London, President Royal Statistical Society)  Statistics Seminar  
14:15  Royal Statistical Society (RSS) Sheffield Local group seminar.
Pigeonholes and mustard seeds: Growing capacity to use data for society 

Hicks Seminar Room J11  
Abstract: The Royal Statistical Society was founded to address social problems ‘through the collection and classification of facts’, leading to many developments in the collection of data, the development of methods for analysing them, and the development of statistics as a profession. Nearly 200 years later an explosion in computational power has led, in turn, to an explosion in data. We outline the challenges and the actions needed to exploit that data for the public good, and to address the step change in statistical skills and capacity development necessary to enable our vision of a world where data are at the heart of understanding and decisionmaking. 



Nov 11  Mon  CANCELLED  Statistics Seminar  
13:00  Deep Learning reading group  
LT 6  


Nov 21  Thu  Leo Bastos (LSHTM)  Statistics Seminar  
14:00  Modelling reporting delays for disease surveillance data  
LT E  
Abstract: One difficulty for realtime tracking of epidemics is related to reporting delay. The reporting delay may be due to laboratory confirmation, logistic problems, infrastructure difficulties and so on. The ability to correct the available information as quickly as possible is crucial, in terms of decision making such as issuing warnings to the public and local authorities. A Bayesian hierarchical modelling approach is proposed as a flexible way of correcting the reporting delays and to quantify the associated uncertainty. Implementation of the model is fast, due to the use of the integrated nested Laplace approximation (INLA). The approach is illustrated on dengue fever incidence data in Rio de Janeiro, and Severe Acute Respiratory Illness (SARI) data in Paraná state, Brazil. 



Nov 28  Thu  POSTPONED: Marcel Ortgiese (Bath)  Statistics Seminar  
14:00  
LT E  


Dec 5  Thu  POSTPONED: Heather Battey (Imperial)  Statistics Seminar  
14:00  Aspects of highdimensional inference  
LT 10  


Dec 12  Thu  Jeremy Colman (Sheffield)  Statistics Seminar  
14:00  SimulationBased Calibration (SBC)  
LT E  
Abstract: SBC is a relatively new method for checking Bayesian inference algorithms. Its advocates (Talts et al. (2017)) argue that it identifies inaccurate computation and inconsistencies in model implementation and also provides graphical summaries to indicate the nature of the underlying problems. An example of such a summary is given. Although SBC has emerged from the Stan development team it is applicable to any Bayesian model that is capable of generating posterior samples. It does not require the use of any particular modelling language. I shall explain why there might indeed be a gap that SBC could fill, demonstrate how SBC works in practice, and discuss the balance between its costs and benefits. 



Feb 13  Thu  Ines Krissaane (Sheffield)  Statistics Seminar  
14:00  Robustness of Variational Inference under Model Misspecification  
LT 6  
Abstract: In many complex scientific problems, we deal with a model that is misspecified relative to the data generating process, in the sense that there is no parameter setting that allows the model to perfectly replicate the data. We will review the recent paper Generalized Variational Inference (https://arxiv.org/pdf/1904.02063.pdf) and expose arguments for using VI under model misspecification. As an application, we will focus on the Hodgkin Huxley model of action potentials, and infer parameters from uncertain experimental measurements using a variational auto encoder method. 



Feb 27  Thu  Mark Dunning, Tim Freeman, Sorkatis Kariotis (Sheffield)  Statistics Seminar  
16:30  Statistical and Data Analysis Challenges in Bioinformatics  
K14  
Abstract: Bioinformatics is a multidisciplinary subject that combines aspects of biology, computer science and statistics. Modern experimental techniques are able to generate vast amounts of data that can profile an individual's genome and offer insights into the development of disease and potential novel therapeutics. In this talk, I will describe the challenges faced by Bioinformaticians trying to deal with such data on a daily basis and the opportunities for collaboration with other disciplines to develop new analytical methods. 



Mar 19  Thu  Susan Cox (KCL)  Statistics Seminar  
14:00  
LT 6  


Apr 30  Thu  Heather Battey (Imperial)  Statistics Seminar  
14:00  
LT 6  


May 14  Thu  Steven Julious (Sheffield)  Statistics Seminar  
16:00  Florence Nightingale: The Passionate Statistician  
https://teams.microsoft.com/l/meetupjoin/19%3ameeting_YjVlZTY1NTItNGU4Mi00N2ZjLThmYWEtM2Y1NjExNjc5MTA1%40thread.v2/0?context=%7b%22Tid%22%3a%2219c3a1c9f5834a18b6ad75cc9c14243c%22%2c%22Oid%22%3a%22da5c99d8843a4aa784c229a3732945ed%22%7d  
Abstract: The Passionate Statistician was the title given to Florence Nightingale by her first biographer Sir Edward Cook. Florence Nightingale was a firm believer in the accurate quantification of evidence to inform decisions. It was her belief in the accurate collection and presentation of data that informed the work she undertook to improve military hospitals. She was of the view that “to understand God’s thoughts, we must study statistics for these are the measure of His purpose” and she used her statistical abilities to inform debates that led to a decline in preventable deaths in military and civilian hospitals. This year marks 200 years since the birth of Florence Nightingale and in this talk Steven will pay tribute to her work in statistics and its long lasting impact. The webinar will take place on Microsoft Teams  you can join on the web or on the Teams app (if you have it), but you should not need to have an account. 



May 27  Wed  Adam Butler (BIOSS)  Statistics Seminar  
14:00  
LT 6  


May 12  Wed  Kevin Wilson and Cameron Williams (Newcastle)  Statistics Seminar  
14:00  A comparison of prior distribution aggregation methods  
Google Meet  
Abstract: When eliciting prior distributions from experts, it may be desirable to combine them into a single group prior. There are many methods of expertelicited prior aggregation, which can roughly be categorised into two types. Mathematical aggregation methods combine prior distributions using a mathematical rule, while behavioural aggregation methods assist the group of experts to come to a consensus prior through discussion. As many commonly used aggregation methods have different requirements in the elicitation stage, there are few, if any, comparisons between them. Using a clinical trial into a novel diagnostic test for Motor Neuron Disease as a case study, we elicited a number of prior distributions from a group of experts. We then aggregated these prior distributions using a range of mathematical aggregation methods, including Equal Weights linear pooling, the Classical Method, and a Bayesian aggregation method. We also undertook an inperson behavioural aggregation with the experts, using the Sheffield Elicitation Framework, or SHELF. Using expert answers to seed questions, for which the elicitors know the true values, we compare and contrast the different aggregation methods and their performance. We also demonstrate how all considered aggregation methods outperform the individual experts. 



May 22  Mon  Lisa Hampson (Novartis)  Statistics Seminar  
14:00  Bayesian methods to improve quantitative decision making in drug development and the role of expert elicitation  
Hicks LT 6 / meet.google.com/onyuzabqyz  
Abstract: There are several steps to confirming the safety and efficacy of a new medicine. A sequence of trials, each with its own objectives, is usually required. Bayesian measures of risk, such as assurance or more generally probability of success (PoS), can be useful for informing decisions about whether a medicine should transition from one stage of development to the next. In this presentation, we describe a Bayesian approach for calculating PoS before pivotal (confirmatory) clinical trials are run which synthesizes internal clinical data, industrywide success rates, and expert opinion or external data if needed. In particular, where there are differences between early phase and confirmatory trials due to a change in outcome for example, we propose eliciting expert judgements to relate existing data to the unknown quantities of interest. We discuss two approaches for establishing a multivariate distribution for several related efficacy treatment effects within the Sheffield Elicitation Framework (SHELF) and describe how they were applied to evaluate the PoS of the registrational program of an asthma drug. We conclude by reflecting on some of the opportunities and practical challenges encountered when using elicitation to support the evaluation of PoS. 



Nov 9  Thu  Wei Xing (Sheffield)  Statistics Seminar  
14:00  Reliable AI for Engineering  
Hicks Seminar Room J11  
Abstract: Artificial intelligence (AI) has seismically shifted the landscape across multiple domains including scientific computing, manufacturing, and engineering. However, the importance of Reliable AI extends beyond what general AI can offer, particularly in scenarios where the stakes are high. Reliable AI, as the name suggests, emphasizes reliability, robustness, and trustworthiness, crucial for realworld applications where uncertainties and highstakes decisions are the norms. In this talk, I will share our development of reliable AI techniques using Bayesian models and how these methods can be implemented to improve problems in integrated circuit design and some other broader applications in engineering such as digital twins. 



Nov 20  Mon  Richard Wilkinson (Nottingham)  Statistics Seminar  
15:00  Adjointaided inference for latent force models  
Hicks Seminar Room J11  
Abstract: Linear systems occur throughout engineering and the sciences, most notably as differential equations. In many cases the forcing function for the system is unknown, and interest lies in using noisy observations of the system to infer the forcing, as well as other unknown parameters. In this talk I will show how adjoints of linear systems can be used to efficiently infer forcing functions modelled as Gaussian processes. Adjoints have recently come to prominence in machine learning, but mainly as an approach to compute derivatives of cost functions for differential equation models. Here, we use adjoints in a different way that allows us to analytically compute the leastsquares estimator, or the full Bayesian posterior distribution of the unknown forcing. Instead of relying on solves of the original (forward model), we can recast the problem as n adjoint problems, where n is the number of data points. All that is required is the ability to solve adjoint systems numerically: it does not rely upon additional tractability of the linear system such as the ability to compute Green’s functions. We'll demonstrate this approach by inferring the pollution source in an advectiondiffusionreaction equation. 



Feb 13  Tue  Emmanouil Kalligeris (Sheffield)  Statistics Seminar  
15:00  A Twisted Markov Switching Mechanism for the Modelling of Incidence Rate Data  
Hicks Seminar Room J11  
Abstract: Various time series models have been used over the years to capture the dynamic behaviour of significant variables in various scientific fields such as epidemiology, seismology, meteorology, finance, etc. In this work, a conditional mean Markov regime switching model with covariates is proposed and studied for the analysis of incidence rate data. The components of the model are selected by both penalised likelihood techniques in conjunction with the Expectation Maximisation algorithm, with the aim of achieving a high level of robustness with respect to modelling the dynamic behaviour of epidemiological data. In addition to statistical inference, changepoint detection analysis is used to select the number of regimes, reducing the complexity associated with likelihood ratio tests. [Kalligeris EN, Karagrigoriou A, Parpoula C. (2023): On Stochastic Dynamic Modeling of Incidence Data. Int J Biostat, 10.1515/ijb20210134] 



Feb 27  Tue  Prof. Robin Henderson (Newcastle University)  Statistics Seminar  
15:00  Event History and Topological Data Analysis  
Hicks Seminar Room J11  
Abstract: Topological data analysis has become popular in recent years, though mainly outside the statistical literature. In this talk we review some of the elements of topological data analysis and we show links to event history and survival analysis. We argue that exploiting topological data as event history can be useful in the analysis of data in the form of images. We propose a version of the wellknown NelsonAalen cumulative hazard estimator for the comparison of topological features of random fields and for testing parametric assumptions. We suggest a Cox proportional hazards approach for the analysis of embedded metric trees. The NelsonAalen method is illustrated on globally distributed climate data and on neutral hydrogen distribution in the Milky Way. The Cox method is used to compare vascular patterns in fundus images of the eyes of healthy and diabetic retinopathy patients. 



Apr 16  Tue  Dominic Grainger and Dr Ben Wigley (Sheffield)  Statistics Seminar  
15:00  Dominic: The Efficient Modelling of Individual Animal Movement in Continuous Time; Ben: Stressing over shape: A Procrustean investigation of dental fluctuating asymmetry.  
Hicks Seminar Room J11  


May 30  Thu  Statistics UQ Reading group  Statistics Seminar  
14:00  
Hicks Seminar Room J11  


Jun 11  Tue  Dr Zexun Chen (Edinburgh)  Statistics Seminar  
15:00  Peerinduced Fairness: A Simple Causal Approach for Algorithmic Bias Discovery in Credit Approval  
Abstract: In today's world, where AI and automation increasingly shape decisionmaking processes, ensuring algorithmic fairness is paramount. While much attention has been given to fairness concepts like statistical parity and equal opportunity, practical challenges in detecting and addressing bias remain. Traditional methods often involve embedding fairness metrics into algorithms, which can compromise their accuracy. In this seminar, I will introduce a fundamental shift in tackling algorithmic bias by presenting our novel "peerinduced fairness" framework. This approach leverages counterfactual fairness and advanced causal inference techniques, including the Single World Intervention Graph, to detect bias at the individual level through peer comparisons and hypothesis testing. Focusing on the context of credit approval, our framework addresses common issues such as data scarcity and imbalance, and operates independently of specific decisionmaking methodologies, such as classifier selection. It provides explainable feedback to individuals who receive adverse decisions, distinguishing between algorithmic bias, discrimination, and the capabilities of the subjects involved. Our framework has been validated using a dataset of SMEs, demonstrating its effectiveness in identifying unfair practices and suggesting practical interventions. The results show that 'peerinduced fairness' not only improves fairness in algorithmic decisions but also serves as a flexible, transparent, and adaptable tool for diverse applications. Finally, if time allows, I will present some of my working ideas around Gaussian process modelling, including multivariate Gaussian processes and constrained Gaussian processes. 



Jul 1  Mon  Statistics UQ Reading group  Statistics Seminar  
15:00  
Hicks Seminar Room J11  


Jul 16  Tue  Jeremy Oakley (Sheffield)  Statistics Seminar  
15:00  Reading group: AutoEncoding Variational Bayes (Kingma and Welling, https://arxiv.org/pdf/1312.6114)  
Hicks LTD  


Jul 23  Tue  Jeremy Oakley (Sheffield)  Statistics Seminar  
14:00  Reading group: AutoEncoding Variational Bayes (Kingma and Welling, https://arxiv.org/pdf/1312.6114)  Continued!  
Hicks Seminar Room J11  

