Pseudonymisation Tool


Pseudonymisation provides a list of identifying keywords that includes UIDs and an anonymisation strategy that utilises SHA3_256 to hash text which is then encoded as base64, or for UIDs, converted to an integer appended to the PyMedPhys Org Root. Dates are shifted consistently and ages are jittered. The list of identifying keywords and the anonymisation strategy are passed in to the (non-experimental) anonymisation module/functions.


pymedphys.experimental.pseudonymisation.pseudonymise(dicom_input, output_path=None)[source]

Convenient API to pseudonymisation. Elements whose tags are not in the pydicom dictionary will be deleted PatientSex will not be modified/pseudonymised For fine tune control, use anonymise_dataset() instead

  • dicom_input (pydicom.dataset.Dataset | str | pathlib.Path) – Either a dataset, a path to a file or a path to a directory

  • output_path (str | pathlib.Path, optional) – If the input is a file or a path, the directory to place the pseudonymised files, by default None


if the dicom_input was a dataset, return the pseudonymised dataset if the dicom input was a file, return the path to the pseudonymised file. if the dicom input was a directory, return the list of successfully anonymised files, and return that instead of None

Return type

pydicom.dataset.Dataset | str | list of str

pymedphys.experimental.pseudonymisation.is_valid_strategy_for_keywords(identifying_keywords=None, replacement_strategy=None)[source]
pseudonymisation.pseudonymisation_dispatch strategy, i.e. dictionary of VR and function references for anonymisation to achieve pseudonymisation


import pymedphys.experimental.pseudonymisation as pseudonymisation_api

pseudonymisation_api.pseudonymise(ds_input, output_path="/home/myname/pseudo_out/")
# or
ds_pseudo = anonymise_dataset(ds_input,