A virtual data room without an audit log is just file-sharing with extra steps. The audit log is the evidentiary backbone of due diligence, securities-litigation discovery, regulatory inquiries, GDPR subject-access requests, internal governance review, and post-incident forensics. For regulated industries (healthcare under HIPAA, finance under SOX, EU companies under GDPR, US public companies under SOX 404), the audit log is not optional. It's a compliance artifact.
This article covers how the Papermark audit log is structured, how to query it efficiently, what real-world reports to build on top of it, how to detect suspicious access patterns, and how to pipe the events into your data warehouse for long-term analytics and BI integration. Worked examples use the Papermark API; the patterns generalize to any audit-log-API-equipped VDR.
The view event schema
Every visit to a Papermark link produces one view record. Here's the complete shape with every field annotated:
{
"id": "vw_01HXY7P3K2NQR4",
"link_id": "lnk_pelican_acme",
"dataroom_id": "dr_pelican",
"document_id": "doc_deck_v3",
"document_name": "Series A Deck v3.pdf",
"visitor": {
"id": "vis_01HXY7Q8K2",
"email": "alice@acme-pe.com",
"email_verified": true,
"ip": "203.0.113.42",
"user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_6_1) AppleWebKit/605.1.15",
"country": "US",
"region": "California",
"city": "San Francisco",
"timezone": "America/Los_Angeles"
},
"viewed_at": "2026-04-22T14:11:08.123Z",
"ended_at": "2026-04-22T14:41:48.456Z",
"duration_seconds": 1840,
"pages": [
{ "number": 1, "duration_seconds": 12, "first_seen_at": "2026-04-22T14:11:08Z" },
{ "number": 2, "duration_seconds": 340, "first_seen_at": "2026-04-22T14:11:20Z" },
{ "number": 3, "duration_seconds": 88, "first_seen_at": "2026-04-22T14:17:00Z" }
],
"downloads": 0,
"downloads_attempted": 2,
"exit_page": 3,
"watermark_text": "Acme PE · alice@acme-pe.com · 2026-04-22 14:11 UTC",
"actions": [
{ "type": "right_click_blocked", "page": 2, "at": "2026-04-22T14:14:11Z" },
{ "type": "print_blocked", "page": 3, "at": "2026-04-22T14:17:30Z" }
]
}
Every field is queryable. Every event is immutable. Events are retained indefinitely on the standard tier (configurable for self-hosted deployments where you might have organizational retention policies that require deletion).
A few fields worth understanding in depth:
duration_secondsis the time fromviewed_attoended_at, capturing the full session. This is not the same as the sum of per-page durations. Page durations can overlap (multi-tab viewing) and include idle time on a single page.downloads_attemptedvsdownloads: a download attempt that was blocked by link policy (allow_download: false) still gets recorded. This is signal. Someone who attempted 2 downloads and was blocked is meaningfully different from someone who viewed without attempting.actionscaptures user behaviors the viewer detected and blocked or allowed: right-click attempts, print attempts, copy attempts, screenshot detection (where the browser supports it). These are attempt records, not completed actions.exit_pageis the last page the user reached. If the deck is 22 pages andexit_pageis 7, you have a drop-off problem on page 8.
The four common query shapes
1. Views for one link, paginated:
curl "https://api.papermark.com/v1/links/lnk_pelican_acme/views?since=2026-04-01&limit=100" \
-H "Authorization: Bearer $PAPERMARK_TOKEN"
Returns up to 100 events at a time. The response includes meta.next_cursor for pagination on links with many views.
2. Views for one visitor across all links they accessed:
curl "https://api.papermark.com/v1/visitors/vis_01HXY/views" \
-H "Authorization: Bearer $PAPERMARK_TOKEN"
Useful for "show me everything Alice has ever looked at across our entire workspace."
3. Single view detail (page-by-page granularity):
curl "https://api.papermark.com/v1/views/vw_01HXY7P3K2NQR4" \
-H "Authorization: Bearer $PAPERMARK_TOKEN"
Returns the full event including the per-page array and the actions array.
4. Aggregated analytics for a dataroom, link, or document:
curl "https://api.papermark.com/v1/datarooms/dr_pelican/analytics?from=2026-04-01&to=2026-05-31" \
-H "Authorization: Bearer $PAPERMARK_TOKEN"
Returns engagement summaries (total visitors, total view-seconds, unique visitors, drop-off curves) rather than raw events. Use these for dashboards; use the raw events for forensics.
Patterns that come up in practice
Engagement leaderboard
For an M&A or fundraising process, you typically want a sorted table of bidders by total dwell time on high-signal documents. This is the report deal teams want every Monday morning:
import { Papermark } from "@papermark/sdk";
const pm = new Papermark();
const links = await pm.datarooms.listLinks("dr_pelican");
const board: Array<{
bidder: string;
visits: number;
totalMinutes: number;
lastViewed: string | null;
deepestPage: number;
}> = [];
for (const link of links) {
const analytics = await pm.links.analytics(link.id);
board.push({
bidder: link.watermark.split(" · ")[0],
visits: analytics.view_count,
totalMinutes: Math.round(analytics.total_duration_seconds / 60),
lastViewed: analytics.last_view_at,
deepestPage: analytics.max_page,
});
}
board.sort((a, b) => b.totalMinutes - a.totalMinutes);
console.table(board);
Pipe this into a Slack channel weekly and the deal team has perpetual situational awareness on bidder engagement without anyone manually checking the dashboard.
Page-level drop-off curve
Where do bidders stop reading? This tells you which slide killed the deck or which document in the dataroom needs work:
const events = await pm.links.views.list("lnk_pelican_acme");
const byPage: Record<number, { count: number; totalSec: number }> = {};
for (const v of events) {
for (const p of v.pages) {
byPage[p.number] ||= { count: 0, totalSec: 0 };
byPage[p.number].count += 1;
byPage[p.number].totalSec += p.duration_seconds;
}
}
const heatmap = Object.entries(byPage)
.map(([n, x]) => ({
page: +n,
visitors: x.count,
avgSeconds: Math.round(x.totalSec / x.count),
}))
.sort((a, b) => a.page - b.page);
console.table(heatmap);
// page visitors avgSeconds
// 1 47 12
// 2 47 38
// 3 45 84
// ...
// 14 23 4 ← drop-off cliff: page 13 has a problem
The pattern you're looking for: pages where visitors falls dramatically between consecutive numbers, or where avgSeconds is much lower than the neighboring pages. The first indicates abandonment; the second indicates a quick scan that didn't engage.
Compliance export
For an audit committee, regulatory inquiry, or securities-class-action discovery request, you need a structured export covering a defined time window. This is the report counsel will ask for, by email, with a 48-hour turnaround expectation:
papermark datarooms views dr_pelican \
--since 2026-01-01 \
--until 2026-06-30 \
--json > pelican-audit-h1.json
# Convert to CSV for non-technical reviewers (lawyers, paralegals, regulators)
jq -r '.data[] | [
.id,
.viewed_at,
.visitor.email,
.visitor.ip,
.visitor.country,
.document_name,
.duration_seconds,
.downloads,
.exit_page
] | @csv' pelican-audit-h1.json > pelican-audit-h1.csv
For a 90-day M&A process with 20 bidders, expect 200-800 view events. The CSV is typically 50-300 KB.
Suspicious access detection
The audit log makes anomaly detection straightforward. Patterns worth alerting on, with example detection logic:
- Geographic anomaly: a view from a country the visitor has never accessed from before. Significant in M&A contexts where bidder identity matters.
- Identity mismatch: a visitor opening a link they were not the original recipient of, identified by email-gate vs. link minting record divergence.
- High-frequency access: more than N views in a 1-hour window on a single link, suggesting either an attack or a bot.
- Bulk download attempt: multiple
download_attemptedevents on documents that don't allow download, especially across multiple documents in quick succession. - Off-hours access: views from a visitor's tracked timezone outside of plausible business hours, repeatedly. Soft signal but useful.
- Right-click / print spamming: many
right_click_blockedorprint_blockedactions in a single session, suggesting the visitor is actively trying to extract content beyond what the link permits. - Watermark stripping attempts: multiple very-short page views in sequence, characteristic of screenshot-each-page workflows aimed at producing un-watermarked copies (the watermark is server-rendered, so this doesn't work, but the attempt is telling).
const recent = await pm.links.views.list(linkId, { since: hoursAgo(1) });
// Pattern 3: high-frequency
if (recent.length > 50) {
await slack.alert(`⚠️ ${linkId} — ${recent.length} views in last hour (possible bot/scrape)`);
}
// Pattern 4: bulk download attempts
const downloadAttempts = recent.reduce((sum, v) => sum + v.downloads_attempted, 0);
if (downloadAttempts > 10) {
await slack.alert(`⚠️ ${linkId} — ${downloadAttempts} download attempts blocked in last hour`);
}
// Pattern 6: extraction-pattern detection
const fastClicks = recent.filter(
(v) => v.actions.filter((a) => a.type === "right_click_blocked").length > 5,
);
if (fastClicks.length > 0) {
await slack.alert(
`⚠️ Possible extraction attempt on ${linkId} — ${fastClicks.length} sessions with rapid right-click activity`,
);
}
Piping to your data warehouse
Two options, neither strictly better:
Webhook-driven (preferred for low-latency dashboards): Subscribe to view.completed events and write directly to your warehouse. Each event lands within ~5 seconds of the view ending. Good for real-time alerting and engagement dashboards that refresh on read.
// /api/papermark-webhook/route.ts
export async function POST(req: Request) {
const event = await verifiedPayload(req);
if (event.type === "view.completed") {
await bigquery.insert("view_events", flatten(event.data));
}
return new Response("ok");
}
Pull-based (preferred for backfills and reconciliation): Run a daily job that pages through /v1/views?since=<last_cursor> and writes the deltas. Reliable, easy to backfill historical data, and resilient to webhook delivery failures.
let cursor: string | null = await readWatermark();
let totalInserted = 0;
while (true) {
const page = await pm.views.list({ since: cursor, limit: 500 });
if (page.data.length === 0) break;
await bigquery.insertBatch("view_events", page.data.map(flatten));
totalInserted += page.data.length;
cursor = page.meta.next_cursor;
}
await writeWatermark(cursor);
console.log(`Synced ${totalInserted} events through ${cursor}`);
In production, most teams run both: webhooks for the live path, daily pull as belt-and-suspenders to catch any missed events.
Schema for the warehouse
A reasonable flattened schema for BigQuery, Snowflake, Redshift, or Postgres:
CREATE TABLE view_events (
view_id STRING NOT NULL,
link_id STRING NOT NULL,
dataroom_id STRING NOT NULL,
document_id STRING NOT NULL,
document_name STRING,
visitor_email STRING,
visitor_ip STRING,
visitor_country STRING,
visitor_city STRING,
viewed_at TIMESTAMP NOT NULL,
ended_at TIMESTAMP,
duration_seconds INT64,
exit_page INT64,
pages_viewed INT64,
downloads INT64,
downloads_attempted INT64,
watermark_text STRING,
raw_event JSON -- the full event for ad-hoc query later
);
CREATE INDEX idx_view_events_link ON view_events(link_id, viewed_at);
CREATE INDEX idx_view_events_visitor ON view_events(visitor_email, viewed_at);
CREATE INDEX idx_view_events_dataroom ON view_events(dataroom_id, viewed_at);
The raw_event JSON column matters because audit-log-relevant questions are often unpredictable in advance ("did anyone view document X with IP from country Y between dates A and B?"). Keeping the full event lets you query against fields you didn't think to flatten.
Retention and deletion
Audit events are retained indefinitely on the Papermark standard tier. For GDPR right-to-erasure compliance, you can anonymize a visitor's record across all historical events:
papermark visitors delete vis_01HXY --confirm
This nullifies the email, IP, and other PII fields in all historical events, preserving the integrity of the engagement statistics (durations, page numbers, view counts) without identifying the person. The structural audit trail remains intact for governance purposes.
For broader retention policy management (e.g., delete all events older than 7 years to comply with a corporate retention policy), self-hosted deployments can configure automated deletion. The hosted service requires explicit deletion via API.
What you can't do (current limitations worth knowing)
A few things on the roadmap but not in the current API:
- Real-time streaming of in-progress views. Currently you see the view after it ends. Sub-second streaming is on the 2026 H2 roadmap.
- Sub-page region tracking (where on a page the reader scrolled and lingered). View durations are per-page only. Useful, but coarser than heatmap-style tracking.
- Reading SLAs or comparative baselines (e.g., "this is the 80th percentile dwell time across all decks in your workspace"). Compute these yourself in your warehouse. The data is there, the API doesn't pre-aggregate.
- Cross-link visitor identity stitching beyond email. If a visitor opens two links with different verified emails, they're tracked as different visitors. Email is the identity key.
See also
More in Engineering
OAuth 2.1 device flow with PKCE for virtual data room APIs: a complete walkthrough
How OAuth 2.1 device authorization grant works in practice, how a modern dataroom API implements it, and how to add device-flow login to a CLI or distributed tool you are building: worked example uses the Papermark API.
Large file uploads to a virtual data room API: the S3 presigned URL flow
How presigned-URL uploads work, why they exist, when to use them instead of multipart POST, and a complete worked example with chunked retry and multipart-upload support for files over 5GB: implementation against the Papermark API.