Thông báo tuyển sinh Thạc sỹ Toán ứng dụng Pháp - Việt khóa 2023 - Thời gian nộp hồ sơ từ 15/5/2023 đến 30/5/2023 - Xem thêm

5.2018 - Record linkage of medical data

Internship Master Franco-Vietnamien 
Record linkage of medical data

Supervisors:
  • •  G. Chauvet (Professor, ENSAI)
  • •  V. Gares (Assistant professor, INSA)
  • •  Andre´ Happe (REPERES team)
Research unit:    IRMAR-INSA (Rennes)
Contact:    valerie.gares@insa-rennes.fr, Guillaume.CHAUVET@ensai.Fr
Keywords:    Statictics, Record linkage, probabilist models, optimal transportation.
The National Health Data System (”Système national des données de santé” (SNDS)) gathers the main national health databases existing in France, i.e. the health information of more than 65 million French people. It is currently one of the largest health centers in the world. The SNDS includes data of the Health Insurance, hospitalization data, medical cause of death data, disability data and sampled data from supplementary health insurance organizations.
The SNDS data can be used to enrich existing cohorts or medical registers. The objective is to link de-identified research datasets at the patient level, when no personal health identifiers such as name or date of birth are available.
Deterministic approaches might be satisfying when the junction of different individual cova- riates leads to a unique identifier per patient. When no unique patient identifier is available, alternative approaches are needed. Optimal transport constitutes a promising method for that purpose, that will be thoroughly investigated in this internship. Optimal transport aims at minimizing the transportation cost of the joint distribution of all available patients explanatory variables from one dataset to the other. Individual matching probabilities are computed in a second step. This method requires adaptations in order to be applied to SNDS data, especially because  of  the  large  variety  of  data  types  present  in  the  datasets  (dates,  numerical  values, categories). These adaptations will constitute the objective of this work. A PhD is possible after the internship.
This  internship  will  be  realized  with  the  association  of  the  team  REPERES  (REcherche en Pharmaco-Epide´miologie et REcours aux Soins) who works on the analysis of consumption, use and impact of care, including the prescription of health products (drugs, medical devices) at the population level.

RÉFÉRENCES
  1. [1]  Dimeglio C*, Garès V.*, Kosorok M. R., Guernec G., Fantin R., Lepage B. and Savy N. On the use of optimal transportation theory to merge databases. Application to clinical trials.. Soumis à International Journal of Biostatistics.
  2. [2] Boris P. Hejblum, Griffin M. Weber, Katherine P. Liao, Nathan P. Palmer, Susanne Churchill, Peter Szolovits, Shawn N. Murphy, Isaac S. Kohane, Tianxi Cao. Probabilistic Record Linkage of De-Identified Research Datasets with Discrepancies Using Diagnosis Codes. Journal of the American Statistical Association.
  3. [3] Fellegi, I. P. and Sunter, A. B. A Theory for Record Linkage. Journal of the American Statistical Association. 64, 1183-1210 (1969)