Click on image to return to General Register Office for Scotland - Homepage

Occasional Papers

Modelling Census Under-Enumeration - A Logistic Regression Perspective

3. Preliminary Analaysis

The outcome variable: proportion imputed in each ward
Firgure 3.1.1 (14 Kb PDF file) and Figure 3.1.2 (11 Kb PDF file), the outcome variable is skewed, and the number of people imputed varied from area to area. Therefore any analysis that assumes normality could produce flawed results. Logistic regression analysis makes no distributional assumptions and is thus a suitable modelling tool in giving an insight into what attributes are more or less likely to predict event outcome – in this case whether or not a person is imputed.
The independent variables

The analysis in this paper examined data collected from two sources – the Scottish Indices of Multiple Deprivation and the 2001 Census.

The Census edit and imputation process  - which identified the “synthetic individuals” - was based on a Hard to Count (HtC) Index derived to account for the disproportionate distribution of under-enumerated households. Therefore, the variables that were used to construct the HtC Index could unduly influence the logistic modelling procedure.

To allow for that influence, the Census variables were divided into:
  • Hard to count variables
  • Other Census variables
The Hard to Count variables

shared – shared dwellings
rent – private rented accommodation
over_crowd – over crowded households
students – students as defined by the National Socio Economic classification
_0to4_ - residents aged 0-4
_20to24_ - residents aged 20-24
_25to29_ - residents aged 25-29
over_85 – residents aged over 85

Other Census variables

Group_0 – ‘no’ educational qualifications
council – Council accommodation
house_assoc – other social rented accommodation
sin_par_f  – female single parents
NS_SeC_3 – intermediate occupations
no_car – non owner-ship of a car/van
floor – lowest floor level
single – single (never married)
density – population density

A series of logistic regression analyses were carried out on the three sets of variables - Deprivation, Hard to Count and Other Census - to yield three logistic models. Initially univariate models - in which the effects of the individual variables are considered separately with the view of determining how significant they were as predictors of the outcome variable – are fitted. Subsequently multivariate models are fitted. Lastly, a final model is developed that looks at which variables – pre-determined from the above - are significant and independent predictors of the binary outcome variable: whether or not a person is imputed.

Note:The file(s) listed above can be viewed in Adobe Portable Document Format (pdf) Get Acrobat Reader Download the latest version of Adobe Acrobat Reader free.


Page last updated: 17 October 2006


If you have any comments about this website please use our contact form.

© Crown Copyright 2008