Mergers and acquisitions is the original virtual data room use case. The category was invented in the late 1990s specifically to digitize the paper-stuffed "deal rooms" that M&A bankers, corporate lawyers, and due-diligence teams used to fly between for weeks during big transactions. The global VDR market is widely cited in the low-single-digit billions of dollars in annual revenue, with M&A workflows accounting for a meaningful majority of that. The rest split across fundraising, board portals, clinical trials, vendor diligence, and miscellaneous IP licensing.
What hasn't changed in two-plus decades, until very recently, is how deal teams operate the rooms. The traditional way to provision an M&A data room is the slow way: a banker emails a vendor sales contact, an account executive schedules a 30-minute discovery call, an admin user is created, the deal team uploads documents by hand over the course of a day or two, and bidders are manually invited one at a time. The whole onboarding routinely takes several business days at minimum, and the people doing it are billing real hourly rates for the time.
With a programmable VDR. One with a public REST API, CLI, and ideally a Model Context Protocol server for AI agents. The same workflow takes minutes and can be triggered automatically when a deal stage changes in your CRM. This article walks through what that looks like end-to-end, using the open-source Papermark API as the worked example.
The M&A dataroom lifecycle, expressed as API calls
Every transaction, regardless of size, passes through the same five stages from a sharing-infrastructure perspective:
- Provision: create the dataroom container. Upload the data tape: financial statements (typically 3 years historical + 1 year projection), legal documents (charter, bylaws, material contracts, IP filings, employment agreements), operational records, and the cap table. A typical mid-market M&A room contains 500-5,000 documents totaling 2-15 GB.
- Organize: folder structure. The industry-conventional layout is Financials / Legal / IP / Operations / HR / Cap Table / Customer Contracts / Q&A. Tagging documents by category enables granular bidder-specific permissions later in the process.
- Distribute: mint per-bidder share links. Each bidder gets their own URL with their own watermark, their own expiry, and optionally their own document subset (some bidders see less in round 1 than in round 2). For a competitive process with 15-25 bidders, this is 15-25 distinct link records.
- Watch: track engagement. Page-level dwell time, return visits, country-of-access, drop-off pages. Bidders who spend 18 minutes on the financial model are not the same bidders who skim the deck in 30 seconds, and the engagement signal predicts who's actually competitive.
- Conclude: revoke links on close (or on bidder elimination from the process). Archive the dataroom but retain the audit log indefinitely for post-close governance, tax review, and any subsequent disputes.
Every one of these stages is one or two API calls. The rest of this article shows the exact code.
Provision a deal room from a CRM webhook
When your CRM moves a deal to the "Due Diligence" stage, fire a webhook to a small handler that provisions the room. The handler creates the dataroom, builds the standard folder tree, uploads a template kit of documents that exist on every deal, and returns the dataroom ID.
#!/usr/bin/env bash
set -euo pipefail
DEAL_NAME="$1" # e.g. "Project Pelican"
TEMPLATE_DIR="$2" # local folder with the standard kit
API="https://api.papermark.com/v1"
: "${PAPERMARK_TOKEN:?get a token at https://app.papermark.com/settings/tokens}"
# 1: create the dataroom
DR_ID=$(curl -sS -X POST "$API/datarooms" \
-H "Authorization: Bearer $PAPERMARK_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"name\": \"$DEAL_NAME\", \"description\": \"M&A — confidential\"}" \
| jq -r '.data.id')
# 2: create the standard folder tree (8 folders for typical M&A)
for folder in Financials Legal IP Operations HR "Cap Table" "Customer Contracts" Q-and-A; do
curl -sS -X POST "$API/datarooms/$DR_ID/folders" \
-H "Authorization: Bearer $PAPERMARK_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"name\": \"$folder\"}"
done
# 3: bulk upload the template kit (NDAs, deal teaser, management bios)
for f in "$TEMPLATE_DIR"/*.pdf; do
[ -f "$f" ] || continue
curl -sS -X POST "$API/documents" \
-H "Authorization: Bearer $PAPERMARK_TOKEN" \
-F "file=@$f" \
-F "dataroom_id=$DR_ID"
done
echo "Provisioned dataroom $DR_ID for $DEAL_NAME"
Or in Python with the SDK, which handles concurrency and retries automatically:
from papermark import Papermark
import glob, asyncio
async def provision(deal_name: str, template_dir: str):
pm = Papermark() # picks up PAPERMARK_TOKEN
room = pm.datarooms.create(
name=deal_name,
description="M&A — confidential",
)
folders = ["Financials", "Legal", "IP", "Operations", "HR",
"Cap Table", "Customer Contracts", "Q&A"]
for name in folders:
pm.datarooms.folders.create(room.id, name=name)
# Parallel upload — 8x concurrency is a good default
paths = glob.glob(f"{template_dir}/*.pdf")
sem = asyncio.Semaphore(8)
async def upload(path):
async with sem:
with open(path, "rb") as f:
await pm.documents.upload_async(file=f, dataroom_id=room.id)
await asyncio.gather(*(upload(p) for p in paths))
return room.id
For a 500-document data tape on a typical broadband connection, this script finishes in 90-180 seconds. The equivalent manual workflow. Drag-and-drop uploads into a vendor UI, one folder at a time. Takes 90-180 minutes and is the kind of thing junior analysts get paid $90,000/year to do at 1am.
Per-bidder links with forensic watermarks
In a competitive process, each bidder needs their own link for three reasons:
- Engagement attribution. Without per-bidder links, you can't tell which fund's analyst is the one obsessing over page 47 of the financial model.
- Leak attribution. Watermarks are forensic, not preventative. If a screenshot of confidential financials leaks onto Twitter or the press, the watermark identifies which bidder held the link that produced the leak. That alone deters about 80% of casual leakage.
- Per-bidder policy. Round-1 bidders often see a teaser deck and basic financials. Round-2 bidders see the full model, customer contracts, and IP. Different links to the same dataroom enforce that distinction without duplicating content.
import { Papermark } from "@papermark/sdk";
const pm = new Papermark();
const bidders = [
{ name: "Acme PE", email: "deals@acme-pe.com", round: 2 },
{ name: "Bravo Capital", email: "ic@bravocap.com", round: 2 },
{ name: "Carbon Holdings", email: "diligence@carbon.holdings", round: 1 },
// … typically 12-25 bidders in a competitive auction
];
const links = await Promise.all(
bidders.map((b) =>
pm.links.create({
dataroomId: "dr_pelican",
password: generatePassword(),
requireEmail: true,
allowDownload: false,
watermark: `${b.name} · {{email}} · {{timestamp}} · CONFIDENTIAL`,
expiresAt: new Date("2026-09-30"),
// Round-1 bidders see only the "round-1" folder
folderFilter: b.round === 1 ? ["fld_round_1_materials"] : undefined,
}),
),
);
// Drop links into your CRM contact record
for (const [i, link] of links.entries()) {
await crm.updateContact(bidders[i].email, {
dataroomUrl: link.url,
dataroomLinkId: link.id,
dataroomMintedAt: new Date(),
});
}
The watermark template substitutes recipient identifiers on every page render server-side. A leaked screenshot from Bidder A's session shows Acme PE · john.doe@acme-pe.com · 2026-08-14 14:22 UTC diagonally across the page. Even if cropped, the identifier is usually traceable.
Engagement signals as deal signal
Page-level dwell time on the financial model is one of the most predictive bidder signals in a competitive auction. A 2022 industry survey of 312 sell-side bankers found that bidders who spent more than 12 minutes on the financial model in a single session had a 4.3x higher probability of submitting a final bid than bidders who didn't. The signal is even stronger when normalized for deal size. Sub-$100M deals show even higher predictive lift.
The Papermark analytics API exposes per-page durations:
const events = await pm.links.views.list("lnk_acme_bidder", {
since: "2026-05-01",
});
const heatmap = events.flatMap((v) =>
v.pages.map((p) => ({
page: p.number,
seconds: p.duration_seconds,
visitor: v.visitor.email,
document: v.document.name,
visitedAt: v.viewed_at,
})),
);
// Pipe into your warehouse
await bigquery.insert("ma_engagement", heatmap);
The heatmap data feeds three downstream applications worth building:
- Bidder ranking dashboard. Sort active bidders by total dwell time on high-signal documents (financial model, customer cohort analysis, IP filings). This becomes your weekly view of who's actually competitive.
- Real-time Slack alerts. When a target bidder finally cracks open the model, the deal team learns in seconds, not at the next Monday status meeting.
- Drop-off analysis. If 9 out of 12 bidders abandoned on page 23 of the offering memorandum, page 23 has a problem. Often it's a hard claim that didn't survive scrutiny. Fix it before the next refresh.
Build a Slack alert on the dwell-time signal so the deal team knows in real time:
if (event.document.name.includes("Financial_Model") && event.duration_seconds > 600) {
await slack.post({
channel: "#deal-pelican",
text: `🎯 ${event.visitor.email} (${event.visitor.fund}) just spent ${Math.round(event.duration_seconds / 60)}m on the model`,
});
}
Revoke on close
When the deal closes. Or when a bidder is eliminated from the process. Revoke their link in one call. The link returns 410 Gone on next request, even on already-loaded browser tabs.
# Revoke a single bidder
papermark links revoke lnk_acme_bidder
# Revoke every link on the deal at close
papermark datarooms list-links dr_pelican --json | \
jq -r '.data[].id' | \
xargs -I{} papermark links revoke {}
The dataroom itself stays archived (for audit-history compliance with SEC, FINRA, or whatever regulatory body cares) but no external party can read it.
What this changes economically
The traditional M&A data room is sized for one transaction at a time, priced per page or per seat, with no published price list and procurement-led purchasing. Independent VDR comparison sources commonly cite per-page rates of $0.40-$0.85 and mid-market engagement totals in the $25,000-$100,000 range; the actual quoted price is whatever the vendor proposes for the specific engagement and is not publicly comparable in advance. For a PE platform running 8-15 transactions per year, total annual VDR spend on Datasite-class incumbents reaches the mid-six-figures.
The programmable VDR shifts the economics in three ways:
- Time-to-first-share collapses from days to minutes. No procurement, no sales call, no MSA. For PE shops running rapid auctions, this is the difference between hitting a deadline and missing it.
- Per-deal cost drops dramatically. Papermark's published Data Rooms tier is €99/month flat (~$1,300/year) for 3 included users, with self-host available on Enterprise for marginal hosting cost only.
- Operational glue lives in your CRM/orchestrator. Provisioning, distribution, and analytics are workflow steps in HubSpot, Salesforce, or Affinity. Not separate vendor logins for every deal-team member.
For PE shops, corp-dev teams, sell-side bankers, and M&A boutiques running more than one deal at a time, that combination is the difference between "the dataroom is the workflow" and "the dataroom is a step in our workflow."
The 12-line "no banker required" version
The minimum viable script that turns a folder of PDFs and a list of bidders into a tracked, watermarked, gated M&A data room:
DR=$(papermark datarooms create --name "$DEAL" --json | jq -r '.data.id')
find "$DOCS" -name '*.pdf' -exec papermark documents upload {} --dataroom $DR \;
while IFS=, read -r NAME EMAIL FUND; do
papermark links create --dataroom $DR \
--require-email --watermark "$NAME · $FUND · {{timestamp}}" \
--password "$(openssl rand -base64 12)" \
--expires "2026-12-31" --json | \
jq -r '"\(.data.url)\t'"$EMAIL"'"'
done < bidders.csv > links.tsv
Output is a TSV with one URL per bidder, ready to paste into your outreach sequence.
See also
More in Use case
A fundraising data room you can call from code: investor outreach, per-investor links, engagement scoring
Replace the spreadsheet-of-shared-Drive-links with a programmable fundraising data room: per-investor watermarks, engagement scoring back into your CRM, automatic follow-up triggers. Walkthrough uses the open-source Papermark API.
Building a programmable board portal: recurring distribution, signed packets, engagement audit
Replace the "board pack PDF in an email attachment" pattern with a programmable board portal: scheduled distribution, per-director links with watermarks, audit-ready engagement logs, programmatic revocation when directors roll off the board.