How to Summarize PDFs with AI
From 10-page reports to 500-page books. With the right model and prompt for each length.
Last updated
AI can read a PDF in seconds and tell you what matters. The trick is matching the right model + prompt to the document length, because not every AI handles long documents equally.
The constraint behind everything in this guide is the context window - How much text a model can hold at once, measured in tokens. A page of dense text is roughly 500-700 tokens, so a 128K-token model fits about 200 pages and a 200K model about 300. Exceed the window and the model silently truncates - It summarises what it can see and never mentions the chapters it dropped. Most bad AI summaries trace back to exactly this.
This guide covers PDFs up to 1000+ pages, with the right tool, prompts that produce summaries you can act on, and the verification habits that catch the failures.
Step-by-step guide
Choose your model based on PDF length
Different models have different context windows:
- Up to 50 pages: any model works. ChatGPT 4o is fastest.
- 50-200 pages: use Claude Sonnet 4 (200K context).
- 200-1500 pages: Gemini 2.5 Pro (2M context - Only one that fits a whole novel).
Unsure how big your document is in model terms? Paste a typical page into the token counter and multiply by page count. When a document sits near a boundary, go up a tier - A truncated summary fails silently, and that's worse than a slower model. For the long-document case specifically, Claude vs Gemini is the comparison that matters: Claude is more accurate within its window; Gemini's window is 10x larger.
All three are available on AskAI.free Pro.
Upload the PDF
On AskAI.free, drag-and-drop the PDF into the chat composer (Pro/Max feature). The AI extracts the text automatically - No need to convert to text manually.
If your PDF is image-only (scanned), the model will run OCR - Expect lower accuracy on poorly-scanned documents, and treat any numbers it quotes from a scan as unverified until you check them.
Common mistakes at this step: uploading a password-protected PDF (remove the password first; the extractor can't open it), and uploading 5 documents at once then asking questions that don't name which document you mean. One document per conversation keeps answers clean unless you're explicitly comparing them.
Verify the AI actually read the whole document
Before asking for any summary, run this check:
"Before summarising: what is this document's title, how many pages or sections does it have, and what is the final section about?"
The last part is the tripwire. If the model describes the ending correctly, the full document is in context. If it gets vague ("the document concludes with closing remarks") or describes a chapter that's actually in the middle, the file was truncated - Switch to a bigger-context model or split the PDF before continuing.
This 20-second check is the single highest-value habit in this guide. Skipping it is how people end up presenting a summary of two-thirds of a contract.
Use a tiered summary prompt
The all-in-one prompt that gives you a usable summary at three levels of detail:
"Summarise this document in three layers:
1. TL;DR - One sentence.
2. Key points - Bullet list, max 10.
3. Action items - What should I actually do based on this?"
You get a one-glance answer plus the depth if you want it. Adapt layer 3 to the document type: for a contract, "obligations, deadlines and penalties that apply to me"; for a research paper, "findings, limitations the authors admit, and what they'd study next"; for board minutes, "decisions made and who owns each follow-up."
Expected output: a tight executive summary in 15-30 seconds. If layer 2 comes back as ten vague bullets ("the document discusses various risks"), the document is probably long and diverse - Ask for a section-by-section summary instead, then have it roll those up.
Ask follow-up questions
The AI now has the whole document in context. Ask anything:
- "What does this contract say about termination?"
- "Are there any red flags I should be worried about?"
- "Compare what this paper says about X to the standard view."
- "Show me the relevant quote and its page number."
That last pattern deserves to be a habit, not an option. Appending "quote the exact sentence and page number" to any factual question forces the model to retrieve rather than improvise, and it turns verification into a 10-second page-flip. If the AI can't produce a quote for a claim, treat the claim as a guess.
For documents over 1500 pages, chunk and combine
Even Gemini caps at ~1500 pages of dense text. For longer documents:
- Split into 100-page chunks (any free PDF splitter works).
- Summarise each chunk with the tiered prompt above, in separate conversations.
- Paste all the summaries into a final "summarise the summaries" prompt.
This loses some nuance - Cross-references between chunk 2 and chunk 9 disappear - So tell the final prompt what to look for: "These are summaries of one document in sequence. Note any contradictions or themes that recur across chunks." Keep chunk summaries detailed (300+ words each); over-compressed chunks give the final pass nothing to work with.
Worked example: a 90-page lease, start to finish
Sam is signing a commercial lease, 90 pages including riders. The flow: upload to Claude Sonnet 4 (90 pages fits 200K easily). Verify: "What's the final section about?" - Claude correctly describes the personal-guarantee rider. Tiered summary: the TL;DR flags a 5-year term with a personal guarantee; action items include a rent-escalation clause worth negotiating.
Then targeted follow-ups: "Quote every clause about early termination, with page numbers" (three clauses, pages 34, 41, 88 - The page 88 one contradicts page 34). "What's the total cost if I exit in year 2?" Sam checks the three quoted pages himself, confirms the contradiction is real, and walks into the lawyer meeting with two precise questions instead of paying for a full read. Time spent: 20 minutes.
The biggest pitfall: trusting the AI's summary without spot-checking. AI can hallucinate details - Especially on long documents. Always verify the key facts before relying on a summary for important decisions.
Quick troubleshooting reference: summary misses whole chapters - Context overflow, use a bigger model or chunk. Numbers slightly wrong - Scanned PDF and OCR errors, check against the original. Confident answers without quotes - You forgot to force citations. Vague mush - Your prompt asked for "a summary" instead of the tiered format.
Pro tip: ask the AI to quote the relevant page or section for any claim it makes. Forces it to actually retrieve from the document instead of guessing.
Related tools and guides
Try the techniques above on AskAI.free - Your first question is free.
Start a free chat →FAQ
Can AI summarize a contract?
Yes - Claude Sonnet 4 is particularly good at extracting clauses, deadlines and obligations from contracts, and at flagging unusual terms when you ask "what's non-standard here?". The right way to use it: get the summary, then make the AI quote page numbers for every obligation it lists, then read those pages yourself. Treat it as a fast first-pass reader that tells your lawyer where to look, not as the lawyer. Always verify legally binding terms with a professional.
Is it safe to upload confidential PDFs?
Reasonably safe for most business documents - AskAI.free doesn't train on uploads and connections are HTTPS. The risk calculus changes with the stakes: for trade secrets, anything under NDA with strict terms, classified material or patient health data, either don't upload at all or check whether your situation requires an enterprise agreement with a data-processing addendum. A good habit for sensitive-but-not-secret documents: redact names and figures before upload; summaries survive redaction fine.
Can free models summarize PDFs?
File upload on AskAI.free is a Pro feature, but there's a workaround for short documents: select-all and copy the text out of the PDF, then paste it directly into the chat. That works up to roughly 20-30 pages before you hit message-length limits. Beyond that you genuinely need upload plus a long-context model. The 7-day Pro trial lets you test the full pipeline on a real document without payment.
Why did the AI miss something that's clearly in the document?
Three usual suspects. First, truncation: the document exceeded the context window and the model never saw that section - Run the "describe the final section" check. Second, the lost-in-the-middle effect: long-context models pay slightly less attention to the middle of huge inputs, so ask about specific sections directly. Third, scanned pages: OCR can garble tables and small print. If something matters, ask "quote the passage about X verbatim" - Forcing retrieval usually surfaces it.
What's the best AI for summarizing books?
Gemini 2.5 Pro is the only mainstream model whose 2M-token window swallows a full-length book in one pass, so it wins on raw capacity. For anything under roughly 300 pages, Claude Sonnet 4 produces noticeably better summaries - Tighter themes, better quotes, less listing. The practical answer: Claude until the book doesn't fit, then Gemini, then chunk-and-combine for omnibus editions. All three approaches work on AskAI.free Pro.