NLM Scrubber: NLM s Software Application to De-identify Clinical Text Documents
|The safety and scientific validity of this study is the responsibility of the study sponsor and investigators. Listing a study does not mean it has been evaluated by the U.S. Federal Government. Read our disclaimer for details.|
|ClinicalTrials.gov Identifier: NCT02795806|
Recruitment Status : Enrolling by invitation
First Posted : June 10, 2016
Last Update Posted : May 10, 2019
Background: Electronic health records contain a vast amount of data about diseases and treatments. Researchers could use this data to test their ideas, but they would need to use records from more than just their own group of patients. But access to those records is restricted to ensure patient privacy.
U.S. National Library of Medicine (NLM) has created a computer tool called NLM Scrubber. This program recognizes and deletes personal information from health records. The researchers who developed this program now need access to the original records. This will allow them to see how well the program removes personal information from patient records and how they can make it more accurate.
To find ways to improve clinical text de-identification.
No new participants. Researchers will review data that have already been collected.
Researchers will collect a random sample of reports. These will be from different doctors in different fields.
Researchers will manually remove personal information from the records.
Researchers will also automatically remove personal information from original records using NLM-Scrubber.
Researchers will compare the results of the computer program versus the manual changes. They will note when the program has not been removing personal information correctly. They will also note when the program has been deleting nonpersonal health information incorrectly.
Researchers will use the results to revise the program. They will keep testing it until the de-identification process is complete.
|Condition or disease|
|Personally Identifiable Information|
This study is about the quality assessment, improvement, and monitoring of an automatic clinical text de-identification software application called NLM Scrubber, which has been developed at the National Library of Medicine (NLM). The application has been developed so that clinical reports can be used in secondary scientific studies (i.e., for secondary use) without breaching patient privacy. Research on methods for protecting patient privacy and on the development of NLM Scrubber have been conducted by following the guidelines of and in compliance with HIPAA and the Privacy Act.
In order to further develop and improve NLM Scrubber and assess its de-identification performance effectively, the investigators require the original / unredacted samples from all potential clinical report types and sources. To this end, NLM investigators have been
collaborating with entities within NIH, namely, NIH Clinical Center, BTRIS, and NCI as well as outside entities, Kentucky State Registry administered by University of Kentucky and researchers from the University of Pittsburgh, who stated their interest in integrating NLM
Scrubber to their application called Text Information Extraction System. These entities collect samples of various types of clinical reports for assessing and improving NLM Scrubber performance. However we also need access to the original data in order to assess
potential problems and improve the accuracy of NLM Scrubber.
|Study Type :||Observational|
|Estimated Enrollment :||1 participants|
|Official Title:||NLM Scrubber: NLM's Software Application to De-identify Clinical Text Documents|
|Study Start Date :||June 9, 2016|
|Estimated Primary Completion Date :||September 30, 2026|
|Estimated Study Completion Date :||January 29, 2027|
- The rate of de-identification of PII [ Time Frame: 01/01/2017-01/31/2027 ]HIPAA Privacy Rule defines 18 types of personally identifying information, that need to be de-identified, which include personal names, addresses, significant dates, numeric identifiers (such as socialsecurity number). Our annotators label those words and numbers creating a gold standard and NLM-Scrubber tries to recognize andeliminate all of them. The rate of de-identification of PII refers to success of this outcome measure.
- The rate of erroneously redacted clinical information [ Time Frame: 01/01/2017-01/31/2027 ]While NLM-Scrubber tries to eliminate only PII elements while preserving nonidentifying study data, it inadvertently deletes some ofthe non-identifying study data elements (non-protected health information) as well. The rate of erroneously redacted clinical information refers to the failure of NLMScrubber in preserving nonidentifying health information.
To learn more about this study, you or your doctor may contact the study research staff using the contact information provided by the sponsor.
Please refer to this study by its ClinicalTrials.gov identifier (NCT number): NCT02795806
|United States, Maryland|
|National Library of Medicine|
|Bethesda, Maryland, United States|
|Principal Investigator:||Mehmet M Kayaalp, Ph.D.||National Library of Medicine (NLM)|