Keeping PHI out of Medical Image Presentations and Educational Products
Advances in search engines’ web-crawling and content processing technology increasingly enable large-scale information extraction from previously stored files. This technology can extract source images contained in PowerPoint™ presentations and Adobe® PDF files and recognize alphanumeric character information that may be embedded in the image pixels – which means an image with embedded patient information can be indexed by this process. When explicit patient information becomes associated with images in the search engine database, it can be found on subsequent internet searches on the patient’s personal information.
Workflow Steps to Consider When Safely Publishing Medical Images for Education and Publication
Exporting Medical Images
The first place to pay attention to potential PHI exposure is when exporting images from the PACS or another imaging device or application.
Optimally, a “region of interest” screenshot is obtained which only includes actual “anatomic” image pixel information in it. Alternatively, the user can disable the DICOM patient info, i.e. use the remove/hide overlays function in PACS first, before taking the screenshot.
Every time an image is saved directly from PACS as a file, there is a risk that PHI gets into that file via patient data embedded as pixels within the image itself or in the form of metadata if a DICOM file is saved. Even when images contain PHI data, it can be redacted using appropriate tools and processes.
Some images can hold data in Exchangeable Image Format tags (additional information stored together with pixel data), similarly to how DICOM stores data in its tag structure. It’s possible the PACS will utilize these tags to store metadata that needs to be cleaned.
Creating the Presentation
The next place to look for possible PHI is when creating a document or presentation that utilizes (exported) medical images.
When medical images are inserted into a PowerPoint™ presentation and the user attempts to redact burned-in PHI data within the pixel data, they must be careful not to simply “cover up” the PHI by using a mask. Creators frequently crop the image using the PowerPoint™ tool or change the font color so the text blends into the background, but neither will result in actual removal of the information.
Modern search engine technology can automatically identify the content of the original inserted file and index PHI that might have been included, so it is important when cropping an image to explicitly delete the portion that has been cropped so that it cannot later be uncropped. Microsoft has provided instructions on how to delete the cropped areas of a picture and save the file without them.
Converting to PDF
Presentations and documents are often converted to PDFs for sharing, and PDFs can contain PHI in hidden objects as well as metadata stored in tags. Adobe has a “Sanitize” function that will help you identify and redact hidden data.
Additional Tips
- If your image has PHI in the pixel data, consider a third-party image processing software (e.g. IrfanView, Adobe Photoshop or similar) to cut out the PHI and then save just the image data.
- Make sure all slides have no PHI data in the cropped areas – use specific presentation software functions designed to permanently remove cropped content if applicable.
- Make sure all slides have no PHI data in the “Notes” sections or in areas beyond the displayable slide.
Other Regulatory Implications
If you reside in a European Union member state or another nation, your use of patient-identifiable information even for educational purposes must comply with that nation’s privacy laws and regulations.
- Resource: What Is Personal Data Under the GDPR?
If you or your organization remove metadata from medical images, understand that will strip out alternative text that is frequently used to meet web site accessibility standards. Please consult your organization’s legal counsel for specific guidance.
Modern OCR and Indexing by Search Engines
One of the challenges of publishing objects with hidden data is that it is often possible for programs that crawl the internet to find this hidden data and expose it without the author even knowing that data was distributed. Modern search engines are particularly adept at combing through publicly available files at scale, making it possible to quickly uncover a variety of data previously thought to be absent. Additionally, the ability to use Optical Character Recognition (OCR) at scale allows programs to quickly re-generate explicit PHI that was originally burned into the image pixels. Search engines can then associate (“index”) the image with that explicit PHI thereby making it discoverable. As a result, these data can be made available and linked to other text-based information.
If PHI gets out, you can ask the search engine company to review and consider removing a link to sensitive information if they agree that this is the appropriate action. Here is an example of how this process works with Google.
Appendix
What Constitutes PHI?
In JACR® and the U.S. Department of Health and Human Services website, there are robust discussions of PHI and the importance of protecting patient data. It is the responsibility of the individual sharing the medical case to ensure data has been properly de-identified and that any legal constraints for sharing data have been met.
There are two methods for HIPAA-compliant deidentification: Safe Harbor, which identifies specific data elements that need to be removed and Expert Determination, which is used to determine the risk of re-identifying a patient based on statistical expertise. When using medical imaging cases for public consumption, the Safe Harbor method should be used. This method identifies the following data elements should not be shared along with the medical image:
- Names.
- Geographic subdivisions smaller than a state.
- All elements of dates (except year) related to an individual (including admission and discharge dates, birthdate, date of death, all ages over 89 years old, and elements of dates (including year) that are indicative of age).
- Telephone, cellphone, and fax numbers.
- Email addresses.
- IP addresses.
- Social Security numbers.
- Medical record numbers.
- Health plan beneficiary numbers.
- Device identifiers and serial numbers.
- Certificate/license numbers.
- Account numbers.
- Vehicle identifiers and serial numbers including license plates.
- Website URLs.
- Full face photos and comparable images.
- Biometric identifiers (including finger and voice prints).
- Any unique identifying numbers, characteristics or codes.