Like many, I’ve never met a chatbot I trust completely. Not only do they have a propensity to hallucinate by making up facts, but you can never be sure what their parent companies do with the information you provide. Most AI companies say they use your data to further train their models, but anonymize it first. However, you just have to take them at their word on this.
Still, chatbots can be useful for summarizing and explaining complicated information, such as the kind contained in many bank statements, medical reports, and mortgage contracts.
So if you do choose to upload sensitive documents like this, you should take steps to redact as much personal information as possible, not only to protect your privacy from the AI company but also to hedge against future data breaches that could cause your financial and medical records to be spilled across the dark web. Here’s how.
The wrong way to redact your sensitive data
First things first: There’s a right and a wrong way to redact sensitive information, particularly from PDFs, which are the format most of our bank statements, medical records, and contracts come in. As some attorneys general and lawyers have learned the hard way, redacting PDFs the wrong way essentially provides no protection at all.
The “wrong” way is to use a PDF reader’s markup tools, like the pen or highlighter, to scribble out or draw black bars across text. While these methods may hide text to the naked eye, a simple mouse move across the obscured line of text to select it, followed by a copy-and-paste, can often recover it. More advanced PDF tools can also easily remove any digital pen scratches and black highlights entirely, revealing the original text underneath.
In short, the “wrong” way is akin to placing a piece of electrical tape over the lines of a document: it obscures the lines from view, but it can easily be peeled off. So, if you are using this redaction method before uploading your sensitive documents to ChatGPT, your instinct is in the right place, but your execution is off—and that leaves your sensitive personally identifiable information highly vulnerable.
The right way to redact your sensitive information before uploading documents to AI chatbots
The correct way to digitally redact documents is to use a tool specifically designed to destroy underlying data within the PDF’s internal code. These redaction tools literally get rid of the underlying text, making it nearly impossible to recover.
The easiest redaction tool I’ve discovered is built into Apple’s Preview app. Preview is macOS’s default PDF reader (it’s also available on iPhone, but the iOS version lacks a redaction tool). If you’re a Windows user, note that that platform’s native PDF viewer, Microsoft Edge, doesn’t offer such a feature, though there are a number of third-party apps, like Adobe Acrobat Pro (subscription required) and PDFgear (free), that offer redaction tools.
I’ll describe here how to use Apple’s Preview redaction tool, but most other apps’ redaction tools work in similar ways.
How to redact your sensitive information before uploading documents to AI chatbots
The important thing to note about redaction tools is that they are designed to destroy the text you want redacted, making it unreadable. So always be sure to first make a copy of the document you plan to upload to a chatbot, and redact information in the copy.
Always keep the original undredacted document on your computer, so you can access its full contents. If you do not do this, you will lose the ability to read the original document in full, because you will not be able to unredact the text once it is redacted.
Once you’ve made a copy of the document, you are ready to redact. Here’s how:
- Open the copy of the PDF document in the Preview app on your Mac.
- From the menu bar, select Tools>Redact.
- A warning will pop up alerting you that any “redacted content is permanently removed.” Click OK to dismiss the warning.
- Now, move the text selection cursor over any text you want redacted. This may include your name, address, email, phone number, Social Security number, or any other sensitive information. As you drag the text selection tool over your selection, black bars with grey X’s will be laid down across the text. This tells you the text is marked for redaction.
- Continue redacting any text you want across the entire document.
- Once you’ve marked all the text you’ve redacted, you can move your mouse over the black bars to see the text to be redacted beneath it, if you wish. You can also drag your text cursor back over the text to deselect it for redaction.
- If you are happy with your redaction selection, save the document. But note that even with the save, the selected text still has not been redacted.
- Now that the document has been saved, to complete the redaction, close the PDF(keyboard shortcut: Command-W). Once you do this, the text underneath the redaction markings will be destroyed.
When you open the document again, you’ll see permanent black lines with grey X’s on them where the former text was. But the text beneath those lines has been destroyed and should now be unrecoverable.
A few things to keep in mind
While the above method should ensure that your selected text has been redacted correctly, so that it should not be recoverable by an AI chatbot or anyone who accesses the redacted document in the future, redacting personally identifiable information in a document doesn’t necessarily keep your identity anonymous from ChatGPT and other AI chatbots.
This is because, even if you redact all your personally identifiable information in the document, if you are logged into ChatGPT, OpenAI will, of course, know that your account is the one that uploaded that March bank statement or that medical report.
This means that if you want as much anonymity as possible, you should not only securely redact sensitive information in your documents before uploading them to AI chatbots, but also not upload them to any AI chatbot that you are logged into. As an added measure, it’s also a good idea to strip a PDF’s metadata before you upload it, as this metadata may include your name or other information.