Click on image to return to General Register Office for Scotland - Homepage

Occasional Paper

Modelling Census Under-Enumeration - A Logistic Regression Perspective

Results - Hard to Count Index Model

The Hard to Count Index was constructed from Census variables known to be associated with under-enumeration. Under-enumeration was also found to be unevenly spread across certain groups. Since this was the basis of the Census Coverage Survey design, these variables are likely to be significant predictors of the percentage imputed.

The following variables were used in the logistic regression analysis:

  • Shared dwelling
  • Private rented accommodation
  • Percentage in the following age groups: under 4’s, 20-24, 25-29, over 85’s
  • Over-crowded households
  • Students

Here ‘shared dwellings’ is the variable that looks at whether people in dwellings occupied by more than one household will have a high probability of being under-enumerated.

Of the variables entered into the initial model, only private rented accommodation, multi-occupancy (shared dwellings), over-crowded households and people aged 25-29 – indicate a significant relationship based on the maximum likelihood estimates.

The Hard to Count Index Variables can be seen in Table 4.2.1 (12 Kb PDF file).

In Model 3 (19 Kb PDF file), over_crowd has a large negative coefficient – this would give cause for further investigation of the linearity assumption.

Fitting univariate models rarely provides an adequate analysis of the data as the independent variables are usually associated with each other and may have different distributions within the levels of the outcome variable. A multivariable analysis gives more comprehensive modelling of the data. In multivariate logistic modelling, each estimated coefficient provides an estimate of the log odds adjusting for all other variables included in the model. It is therefore of interest to investigate the changes in the odds ratio estimates.

The results shows that while the number of students and persons aged 20-24 are significant predictors of under-enumeration when using the single variable model, at the multivariable level they are not. Students are known to be transient and hence more difficult to enumerate. However, most students are likely to live in (private) rented accommodation. Thus the inclusion of the variable ‘rent’ renders ‘students’ redundant. The same applies to the variable ‘_20to24_’. So knowing the number of residents in rented accommodation is a significant predictor of students and persons aged 20-24.

However ‘_25to29_’ is still predictive in the multivariable model perhaps because people in this age group are much more likely to be on the first rung of the property ladder, and live alone.

Note:The file(s) listed above can be viewed in Adobe Portable Document Format (pdf) Get Acrobat Reader Download the latest version of Adobe Acrobat Reader free.


Page last updated: 17 October 2006


If you have any comments about this website please use our contact form.

© Crown Copyright 2008