PDF Metadata Stripper
A PDF metadata stripper removes hidden author, title, creator, producer, creation date, modification date, custom properties, and XMP metadata from a PDF so the file no longer reveals who wrote it. This stripper runs entirely in the browser using pdf-lib, shows every field before removing it, and includes a verify-clean re-upload pass so authors can confirm a double-blind submission is anonymous before sending it to the journal or conference system.
The PDF you drop is parsed and rewritten inside your browser tab. The library code is bundled into this page; no part of your file is sent anywhere.
What is a PDF metadata stripper?
A PDF metadata stripper removes the hidden fields a PDF carries about who wrote it and how it was made: Author, Title, Subject, Keywords, Creator (the application that produced the source), Producer (the library that wrote the PDF), CreationDate, ModDate, any custom properties left over from a Word template, and the XMP stream that often duplicates and extends those fields in XML. For a double-blind manuscript, every one of those fields is a potential leak. This stripper shows you what is in the file, clears both the Info dictionary and the XMP block in your browser, and lets you re-upload the cleaned file to verify it is empty before you submit.
How to anonymize your PDF for double-blind submission
Drag your manuscript into the upload zone. The left pane lists every populated metadata field the file actually contains. Anything that names you, your co-authors, your institution, your university LaTeX template, or your Overleaf project carries identifying information; reviewers will see all of it if you submit without stripping.
The right pane previews the same fields cleared. Click Strip and download and the browser writes a new PDF with the Info dictionary emptied and the XMP stream removed. The file is offered as a download from the same tab; no upload happens.
Underneath the widget is a verify zone. Drop the cleaned PDF back in and every field should render empty. That second pass is the only proof that survives a venue-side metadata check. Screenshot the verify state for your advisor or attach it to your submission notes.
What gets exposed when you do not strip PDF metadata
Reviewers and submission systems read every one of these by default:
- Author. LaTeX
\hypersetup{pdfauthor=...}writes your full name here. Word writes whatever is in File → Options → User Information, which usually carries the author of the very first document the template was based on. Overleaf inherits its account-holder name. - Title. Sometimes carries an internal working title that names the lab or grant.
- Creator. Identifies the source application (Microsoft Word 16.79, LyX 2.3.7, Pages 13.2). Combined with the Producer field it narrows the author pool fast.
- Producer. The library that wrote the PDF (pdfTeX-1.40.25, Skia/PDF m119, Acrobat Distiller). Often paired with a build identifier that ties back to a specific OS install.
- Last Modified By. Survives a Word → Save As PDF if revision metadata is preserved; carries the most recent co-author’s account name.
- CreationDate / ModDate. ISO timestamps accurate to the second. Cross-referenced against an Overleaf commit log, they identify the project.
- Custom properties. Hidden Word-template fields named Company, Manager, Owner that propagate from the first document opened on the machine.
- XMP block. Duplicates the above as XML inside the document catalog, and adds extras:
xmpMM:DocumentID,xmpMM:InstanceID, history of derived-from documents. Stripping only the Info dictionary leaves XMP in place, which is the failure mode that catches careful authors out. - Comments and annotations. Reviewer comments written in Acrobat carry the commenter’s name. PDF flatten before submission, or remove comments via Tools → Comment → Delete All.
Fynman extracts the full metadata block from every PDF in a literature review automatically, which makes the reverse problem (auditing whether included studies report what they should) take minutes instead of hours.
When venues actually check your PDF metadata
Treat every submission system as if it preserves the file exactly as uploaded. Behavior in the wild:
- OpenReview. Strips a small set of fields server-side and warns on others. A safe baseline, not a guarantee; do not rely on it for fields the venue does not explicitly enumerate.
- CMT (Microsoft). Preserves the file as uploaded. Reviewer downloads carry whatever you sent.
- EasyChair. Preserves the file as uploaded.
- Scholastica. Author guidance tells you to strip metadata yourself before upload and links out to Word and Pages instructions. No server-side stripping.
- Nature Research double-blind option. The author checklist requires metadata removal, enforced by self-attestation. A reviewer who notices an Author field is grounds for desk reject.
- ACM, IEEE, ICLR, NeurIPS, ACL, CHI. Conference-specific submission instructions vary year over year. The constant: assume nothing is stripped for you.
The cost of being wrong is a desk reject for breach of double-blind, which usually means waiting a full cycle to resubmit. Stripping locally takes thirty seconds.
How this stripper works (and how to verify the privacy claim yourself)
The file is read into a Uint8Array via the browser’s FileReader API. pdf-lib, vendored as a same-origin asset on this page, parses the PDF, clears the document-information fields (setTitle(''), setAuthor(''), setSubject(''), setKeywords([]), setProducer(''), setCreator('')), and removes the XMP metadata stream from the document catalog. The output is a fresh Uint8Array piped into a Blob and offered as a download via an object URL. There is no fetch, no XMLHttpRequest, no WebSocket.
The privacy badge at the top of this page carries a live counter that monkey-patches window.fetch and XMLHttpRequest.prototype.open after the page-load event. Each request bumps the counter. When you use the stripper, the counter stays at zero. To prove it for yourself: open browser DevTools, switch to the Network tab, clear the log, and drop a PDF. No new entries appear.
What this tool does not do (and what to use instead)
Honest scope, because trust here matters more than feature breadth:
- Body text scanning. The stripper does not search the paper for your name, your co-authors, or your institution. Use Find and Replace in your editor for that pass before exporting to PDF.
- Comment-author scrubbing. Annotation and comment authors written by Acrobat are preserved (the stripper only touches document-level metadata). For comment scrubbing use Acrobat’s Sanitize Document command, or delete all comments before exporting.
- Track Changes residue. If you exported from Word with revision metadata still enabled, the Last Modified By field can return on the next save. Run Inspect Document in Word and remove personal information before exporting to PDF.
- Image EXIF stripping. Figures embedded in academic PDFs almost never carry EXIF, but if your manuscript embeds raw camera output, run it through an EXIF cleaner first.
- Encrypted or password-protected PDFs. Decrypt first, strip second.
Frequently asked questions
Frequently Asked Questions
Find answers to common questions about this topic.