Statistical disclosure control

Statistical Disclosure Control (SDC) is a technique used in data-driven research to ensure no person or organisation is identifiable from the results of the analysis of survey or administrative data, to protect the confidentiality of the respondents and subjects of the research.[1] There are two main approaches to SDC: principles-based and rules-based.[2]

Why is it necessary?

Many kinds of social, economic and health research use potentially sensitive data as a basis for their research, such as survey or Census data, tax records, health records, educational information, etc. Such information is usually given in confidence, and, in the case of administrative data, not always for the purpose of research.

Researchers are not usually interested in information about one single person or business; they are looking for trends among larger groups of people.[3] However, the data they use is, in the first place, linked to individual people and businesses, and SDC ensures that these cannot be identified from published data, no matter how detailed or broad.[4]

It is possible that at the end of data analysis, the researcher somehow singles out one person or business through their research. For example, a researcher may identify the exceptionally good or bad service in a geriatric department within a hospital in a remote area, where only one hospital provides such care. In that case, the data analysis ‘discloses’ the identity of the hospital, even if the dataset used for analysis was properly anonymised or de-identified.

Statistical disclosure control will identify this disclosure risk and ensure the results of the analysis are altered to protect confidentiality.[5] It requires a balance between protecting confidentiality and ensuring the results of the data analysis are still useful for statistical research.[6]

Rules-based Statistical Disclosure Control

In rules-based SDC, a rigid set of rules is used to determine whether or not the results of data analysis can be released. The rules are applied consistently, which makes it obvious what kinds of output are acceptable. However, because the rules are inflexible, either disclosive information may still slip through, or the rules are overrestrictive and may only allow for results that are too broad for useful analysis to be published.[2]

The Northern Ireland Statistics and Research Agency uses a rules-based approach to releasing statistics and research results.[7]

Principles-based Statistical Disclosure Control

In principles-based SDC, both the researcher and the output checker are trained in SDC. They receive a set of rules, which are rules-of-thumb rather than hard rules as in rules-based SDC. This means that in principle, any output may be approved or refused. The rules-of-thumb are a starting point for the researcher and explain from the beginning which outputs would be deemed safe and non-disclosive, and which outputs are unsafe. It is up to the researcher to prove that any ‘unsafe’ outputs are non-disclosive, but the checker has the final say. Since there are no hard rules, this requires specialist knowledge on disclosure risks from both the researcher and the checker. It encourages the researcher to produce safe results in the first place. However, this also means that the outcome may be inconsistent and uncertain. It requires extensive training and a high understanding of statistics and data analysis.[2]

The UK Data Service employs a principles-based approach to statistical disclosure control.[8]

See also

References

  1. Skinner, Chris (2009). "Statistical Disclosure Control for Survey Data" (PDF). Handbook of Statistics Vol 29A: Sample Surveys: Design, Methods and Applications. Retrieved March 2016. Check date values in: |access-date= (help)
  2. 1 2 3 Ritchie, Felix, and Elliott, Mark (2015). "Principles- Versus Rules-Based Output Statistical Disclosure Control In Remote Access Environments" (PDF). IASSIST Quarterly v39 pp5-13. Retrieved March 2016. Check date values in: |access-date= (help)
  3. "ADRN » Safe results". adrn.ac.uk. Retrieved 2016-03-08.
  4. "Government Statistical Services: Statistical Disclosure Control". Retrieved March 2016. Check date values in: |access-date= (help)
  5. Templ, Matthias; et al. (2014). "International Household Survey Network" (PDF). IHSN Working Paper. Retrieved March 2016. Check date values in: |access-date= (help)
  6. "Archived: ONS Statistical Disclosure Control". Office for National Statistics. Retrieved March 2016. Check date values in: |access-date= (help)
  7. "Census 2001 - Methodology" (PDF). Northern Ireland Statistics and Research Agency. 2001. Retrieved March 2016. Check date values in: |access-date= (help)
  8. Afkhamai, Reza; et al. (2013). "Statistical Disclosure Control Practice in the Secure Access of the UK Data Service" (PDF). United Nations Economic Commission for Europe. Retrieved March 2016. Check date values in: |access-date= (help)
This article is issued from Wikipedia - version of the 8/22/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.