# Research Areas in Statistics

Here are the areas of Statistics in which research is being done currently.

The manner in which a component (or system) improves or deteriorates with time can be described by concepts of aging. Various aging notions have been proposed in the literature. Similarly lifetimes of two different systems can be compared using the concepts of stochastic orders between the probability distributions of corresponding (random) lifetimes. Various stochastic orders between probability distributions have been defined in the literature. We study the concepts of aging and stochastic orders for various coherent systems. In many situations, the performance of a system can be improved by introducing some kind of redundancy into the system. The problem of allocating redundant components to the components of a coherent system, in order to optimize its reliability or some other system performance characteristic, is of considerable interest in reliability engineering. These problems often lead to interesting theoretical results in Probability Theory. We study the problem of optimally allocating spares to the components of various coherent systems, in order to optimize their reliability or some other system performance characteristic. Performances of systems arising out of different allocations are studied using concepts of aging and stochastic orders.

Faculty: Neeraj Mishra

Estimation of entropies of molecules is an important problem in molecular sciences. A commonly used method by molecular scientist is based on the assumption of a multivariate normal distribution for the internal molecular coordinates. For the multivariate normal distribution, we have proposed various estimators of entropy and established their optimum properties. The assumption of a multivariate normal distribution for the internal coordinates of molecules is adequate when the temperature at which the molecule is studied is low, and thus the fluctuations in internal coordinates are small. However, at higher temperatures, the multivariate normal distribution is inadequate as the dihedral angles at higher temperatures exhibit multimodes and skewness in their distribution. Moreover the internal coordinates of molecules are circular variables and thus the assumption of multivariate normality is inappropriate. Therefore a nonparametric and circular statistic approach to the problem of estimation of entropy is desirable. We have adopted a circular nonparametric approach for estimating entropy of a molecule. This approach is getting a lot of attention among molecular scientists.

Faculty: Neeraj Mishra

About fifty years ago statistical inference problems were first formulated in the now-familiar “Ranking and Selection” framework. Ranking and selection problems broadly deal with the goal of ordering of different populations in terms of unknown parameters associated with them. We deal with the following aspects of Ranking and Selection Problems:

1. Obtaining optimal ranking and selection procedures using decision theoretic approach;

2. Obtaining optimal ranking and selection procedures under heteroscedasticity;

3. Simultaneous confidence intervals for all distances from the best and/or worst populations, where the best (worst) population is the one corresponding to the largest (smallest) value of the parameter;

4. Estimation of ranked parameters when the ranking between parameters is not known apriori;

5. Estimation of (random) parameters of the populations selected using a given decision rule for ranking and selection problems.

Neeraj Mishra

In many practical situations, it is natural to restrict the parameter space. This additional information of restricted parameter space can be intelligently used to derive estimators that improve upon the standard (natural) estimators, meant for the case of unrestricted parameter space. We deal with the problems of estimation parameters of one or more populations when it is known apriori that some or all of them satisfy certain restrictions, leading to the consideration of restricted parameter space. The goal is to find estimators that improve upon the standard (natural) estimators, meant for the case of unrestricted parameter space. We also deal with the decision theoretic aspects of this problem.

Faculty: Neeraj Mishra

The outcome of any experiment depends on several variables and such dependence involves some randomness which can be characterized by a statistical model. The statistical tools in regression analysis help in determining such relationships based on the sample experimental data. This helps further in describing the behaviour of the process involved in experiment. The tools in regression analysis can be applied in social sciences, basic sciences, engineering sciences, medical sciences etc. The unknown and unspecified form of relationship among the variables can be linear as well as nonlinear which is to be determined on the basis of a sample of experimental data only. The tools in regression analysis help in the determination of such relationships under some standard statistical assumptions. In many experimental situations, the data do not satisfy the standard assumptions of statistical tools, e.g. the input variables may be linearly related leading to the problem of multicollinearity, the output data may not have constant variance giving rise to the hetroskedasticity problem, parameters of the model may have some restrictions, the output data may be autocorrelated, some data on input and/or output variables may be missing, the data on input and output variables may not be correctly observable but contaminated with measurement errors etc. Different types of models including the econometric models, e.g., multiple regression models, restricted regression models, missing data models, panel data models, time series models, measurement error models, simultaneous equation models, seemingly unrelated regression equation models etc. are employed in such situations. So the need of development of new statistical tools arises for the detection of problem, analysis of such non-standard data in different models and to find the relationship among different variables under nonstandard statistical conditions. The development of such tools and the study of their theoretical statistical properties using finite sample theory and asymptotic theory supplemented with numerical studies based on simulation and real data are the objectives of the research work in this area.

Faculty: Shalabh

Signal processing may broadly be considered to involve the recovery of information from physical observations. The received signals are usually disturbed by thermal, electrical, atmospheric or intentional interferences. Due to the random nature of the signal, statistical techniques play an important role in signal processing. Statistics is used in the formulation of appropriate models to describe the behaviour of the system, the development of appropriate techniques for estimation of model parameters, and the assessment of model performances. Statistical Signal Processing basically refers to the analysis of random signals using appropriate statistical techniques. Different one and multidimensional models have been used in analyzing various one and multidimensional signals. For example ECG and EEG signals, or different grey and white or colour textures can be modelled quite effectively, using different non-linear models. Effective modelling are very important for compression as well as for prediction purposes. The important issues are to develop efficient estimation procedures and to study their properties. Due to non-linearity, finite sample properties of the estimators cannot be derived; most of the results are asymptotic in nature. Extensive Monte Carlo simulations are generally used to study the finite sample behaviour of the different estimators.

Faculty: Debasis Kundu,Amit Mitra

Efficient estimation of parameters of nonlinear regression models is a fundamental problem in applied statistics. Isolated large values in the random noise associated with model, which is referred to as an outliers or an atypical observation, while of interest, should ideally not influence estimation of the regular pattern exhibited by the model and the statistical method of estimation should be robust against outliers. The nonlinear least squares estimators are sensitive to presence of outliers in the data and other departures from the underlying distributional assumptions. The natural choice of estimation technique in such a scenario is the robust M-estimation approach. Study of the asymptotic theoretical properties of M-estimators under different possibilities of the M-estimation function and noise distribution assumptions is an interesting problem. It is further observed that a number of important nonlinear models used to model real life phenomena have a nested superimposed structure. It is thus desirable also to have robust order estimation techniques and study the corresponding theoretical asymptotic properties. Theoretical asymptotic properties of robust model selection techniques for linear regression models are well established in the literature, it is an important and challenging problem to design robust order estimation techniques for nonlinear nested models and establish their asymptotic optimality properties. Furthermore, study of the asymptotic properties of robust M-estimators as the number of nested superimposing terms increase is also an important problem. Huber and Portnoy established asymptotic behavior of the M-estimators when the number of components in a linear regression model is large and established conditions under which consistency and asymptotic normality results are valid. It is possible to derive conditions under which similar results hold for different nested nonlinear models.

Faculty: Debasis Kundu,Amit Mitra

Econometric modelling involves analytical study of complex economic phenomena with the help of sophisticated mathematical and statistical tools. The size of a model typically varies with the number of relationships and variables it is applying to replicate and simulate in a regional, national or international level economic system. On the other hand, the methodologies and techniques address the issues of its basic purpose – understanding the relationship, forecasting the future horizon and/or building “what-if” type scenarios. Econometric modelling techniques are not only confined to macro-economic theory, but also are widely applied to model building in micro-economics, finance and various other basic and social sciences. The successful estimation and validation part of the model-building relies heavily on the proper understanding of the asymptotic theory of statistical inference. A challenging area of econometric modelling has been the application of advanced mathematical concept of wavelets, which are ideally suited to study the chaotic behaviour of financial indicators, to name just one. A successful combination of econometrics with the non-parametric artificial intelligence techniques is another interesting aspect of the modelling exercise. So, whether the purpose is to validate or negate age-old theories in the contemporary world, or to propagate new ideas in the ever-growing complexities of physical phenomena, econometric modelling provides an ideal solution.

Faculty: Shalabh, Sharmishtha Mitra

Economic globalization and evolution of information technology has in recent times accounted for huge volume of financial data being generated and accumulated at an unprecedented pace. Effective and efficient utilization of massive amount of financial data using automated data driven analysis and modelling to help in strategic planning, investment, risk management and other decision-making goals is of critical importance. Data mining techniques have been used to extract hidden patterns and predict future trends and behaviours in financial markets. Data mining is an interdisciplinary field bringing together techniques from machine learning, pattern recognition, statistics, databases and visualization to address the issue of information extraction from such large databases. Advanced statistical, mathematical and artificial intelligence techniques are typically required for mining such data, especially the high frequency financial data. Solving complex financial problems using wavelets, neural networks, genetic algorithms and statistical computational techniques is thus an active area of research for researchers and practitioners.

Faculty: Amit Mitra, Sharmishtha Mitra

Traditionally, life-data analysis involves analysing the time-to-failure data obtained under normal operating conditions. However, such data are difficult to obtain due to long durability of modern days. products, lack of time-gap in designing, manufacturing and actually releasing such products in market, etc. Given these difficulties as well as the ever-increasing need to observe failures of products to better understand their failure modes and their life characteristics in today’s competitive scenario, attempts have been made to devise methods to force these products to fail more quickly than they would under normal use conditions. Various methods have been developed to study this type of “accelerated life testing” (ALT) models. Step-stress modelling is a special case of ALT, where one or more stress factors are applied in a life-testing experiment, which are changed according to pre-decided design. The failure data observed as order statistics are used to estimate parameters of the distribution of failure times under normal operating conditions. The process requires a model relating the level of stress and the parameters of the failure distribution at that stress level. The difficulty level of estimation procedure depends on several factors like, the lifetime distribution and number of parameters thereof, the uncensored or various censoring (Type I, Type II, Hybrid, Progressive, etc.) schemes adopted, the application of non-Bayesian or Bayesian estimation procedures, etc.

Faculty: Debasis Kundu, Sharmishtha Mitra