Generating PDFs in Python with xhtml2pdf (HTML to PDF)

Blog / Python · June 17, 2021 · Updated June 10, 2026 · 9 min read
Generating PDFs in Python with xhtml2pdf (HTML to PDF)

To generate a PDF from HTML in Python with xhtml2pdf, install it with pip install xhtml2pdf, import its pisa module, and call pisa.CreatePDF(source_html, dest=file_handle) — where source_html is your HTML string and dest is any writable file object such as an open file or an in-memory BytesIO buffer. That one call turns styled HTML into a paginated PDF, with no browser and no system binaries to install.

That last point is what sets xhtml2pdf apart: it is pure Python (built on top of ReportLab), so it installs entirely from PyPI with pip and runs anywhere Python runs — handy for containers, AWS Lambda, and locked-down servers where you cannot install native packages. The trade-off is that it understands only a limited subset of HTML and CSS, which makes it a great fit for simple, templated documents like invoices, receipts, and reports, but a poor fit for complex modern layouts.

This guide covers installation, the modern Python 3 pisa API, rendering an HTML string to a BytesIO buffer, returning a PDF from a Django view, the link_callback you need to make images and CSS appear, page setup and page breaks, and an honest comparison with WeasyPrint, pdfkit, and ReportLab so you can pick the right tool. We have built HTML-to-PDF pipelines like this across Python and Django projects for 12+ years, so the code reflects what works in production.

Install xhtml2pdf

Unlike pdfkit (which shells out to the external wkhtmltopdf binary) or WeasyPrint (which needs the native Pango libraries), xhtml2pdf has no system dependencies. A single pip command pulls in everything it needs, including ReportLab:

pip install xhtml2pdf

This installs on Python 3.8+ (Python 2 is long gone — ignore any old tutorial that imports cStringIO or StringIO.StringIO). On Python 3 you use io.BytesIO instead, because a PDF is binary data.

The pisa API: CreatePDF

The whole library boils down to one function. pisa.CreatePDF(src, dest, ...) reads HTML from src and writes the resulting PDF into the dest file object, returning a status object whose .err attribute is truthy when rendering failed:

from io import BytesIO
from xhtml2pdf import pisa

source_html = """
<html>
  <head>
    <style>
      @page { size: a4 portrait; margin: 2cm; }
      body  { font-family: Helvetica; color: #1a1a1a; }
      h1    { font-size: 20pt; }
      .total { font-weight: bold; }
    </style>
  </head>
  <body>
    <h1>Invoice #1042</h1>
    <p>Customer: Acme Inc.</p>
    <p class="total">Amount due: 1,240.00</p>
  </body>
</html>
"""

# 1) Write straight to a file on disk (note the binary "wb" mode)
with open("invoice.pdf", "wb") as output_file:
    status = pisa.CreatePDF(src=source_html, dest=output_file)
    if status.err:
        raise RuntimeError("xhtml2pdf failed to render the PDF")

# 2) Render to an in-memory buffer and return the raw bytes
def html_to_pdf_bytes(html: str) -> bytes:
    """Render an HTML string to PDF and return the bytes (raises on error)."""
    buffer = BytesIO()
    status = pisa.CreatePDF(src=html, dest=buffer, encoding="utf-8")
    if status.err:
        raise RuntimeError("xhtml2pdf failed to render the PDF")
    return buffer.getvalue()

pdf_bytes = html_to_pdf_bytes(source_html)

The BytesIO version is the one you want for web apps and APIs: it keeps the PDF in memory so you can return it in an HTTP response or attach it to an email without ever writing a temporary file to disk.

Render an HTML template to PDF

In real projects you do not hard-code HTML — you render a template with your data first, then hand the resulting string to pisa. Any templating engine works because xhtml2pdf only cares about the final HTML string. Here it is with Jinja2 in a plain Python script:

from jinja2 import Template

template = Template("""
<html>
  <head><style>@page { size: a4; margin: 2cm; }</style></head>
  <body>
    <h1>Receipt for {{ customer }}</h1>
    <table>
      {% for item in items %}
        <tr><td>{{ item.name }}</td><td>{{ item.price }}</td></tr>
      {% endfor %}
    </table>
  </body>
</html>
""")

html = template.render(
    customer="Acme Inc.",
    items=[
        {"name": "Setup", "price": "500.00"},
        {"name": "Support", "price": "740.00"},
    ],
)
pdf_bytes = html_to_pdf_bytes(html)   # reuse the helper from above

Return a PDF from a Django view

Django integration is the most common use case. Render a Django template to an HTML string with render_to_string(), pipe it through pisa.CreatePDF(), and write the result straight into an HttpResponse whose content_type is application/pdf. The Content-Disposition header controls whether the browser downloads the file (attachment) or shows it inline (inline):

from django.http import HttpResponse
from django.template.loader import render_to_string
from xhtml2pdf import pisa


def invoice_pdf(request, invoice_id):
    invoice = get_invoice(invoice_id)
    html = render_to_string("invoices/invoice.html", {"invoice": invoice})

    response = HttpResponse(content_type="application/pdf")
    response["Content-Disposition"] = f'attachment; filename="invoice-{invoice_id}.pdf"'

    # An HttpResponse is file-like, so pisa can write the PDF straight into it.
    status = pisa.CreatePDF(src=html, dest=response, link_callback=link_callback)
    if status.err:
        return HttpResponse("Error generating PDF", status=500)
    return response

Make images and CSS show up: link_callback

The single most common xhtml2pdf complaint is "my images and stylesheets are missing from the PDF." This happens because xhtml2pdf does not fetch assets over HTTP the way a browser does — when it sees <img src="/static/logo.png"> or a linked stylesheet, it tries to read that path from the local filesystem and silently skips anything it cannot resolve.

The fix is a link_callback function that maps the URLs in your HTML (/static/..., /media/...) to absolute paths on disk. Pass it to CreatePDF and pisa will use it to locate every asset:

import os
from django.conf import settings
from django.contrib.staticfiles import finders


def link_callback(uri, rel):
    """
    Resolve the asset URLs in the HTML to absolute filesystem paths so
    xhtml2pdf can embed images, CSS, and fonts into the PDF.
    """
    # First, let Django's staticfiles finders locate the asset (works in dev).
    result = finders.find(uri)
    if result:
        if not isinstance(result, (list, tuple)):
            result = [result]
        return os.path.realpath(result[0])

    # Fall back to STATIC_ROOT / MEDIA_ROOT (collected static, uploaded media).
    s_url, s_root = settings.STATIC_URL, settings.STATIC_ROOT
    m_url, m_root = settings.MEDIA_URL, settings.MEDIA_ROOT

    if uri.startswith(m_url):
        path = os.path.join(m_root, uri.replace(m_url, ""))
    elif uri.startswith(s_url):
        path = os.path.join(s_root, uri.replace(s_url, ""))
    else:
        return uri  # leave absolute http(s):// URLs untouched

    if not os.path.isfile(path):
        raise Exception(f"link_callback could not find file: {path}")
    return path

Two practical tips: reference assets with the same STATIC_URL / MEDIA_URL prefixes you use everywhere else in Django, and remember that in production you must have run collectstatic so the files actually exist under STATIC_ROOT. For remote images you can also use absolute https:// URLs, which the callback passes straight through.

Page setup, headers, and page breaks

xhtml2pdf uses the CSS @page rule for page size and margins, and the xhtml2pdf-specific @frame construct for repeating headers and footers (such as page numbers). For multi-page documents you control pagination with the standard page-break-before / page-break-after CSS, or the <pdf:nextpage /> tag:

<style>
  @page {
    size: a4 portrait;
    margin: 2cm;

    /* A footer frame that repeats on every page */
    @frame footer_frame {
      -pdf-frame-content: footer_content;
      bottom: 1cm; margin-left: 2cm; margin-right: 2cm; height: 1cm;
    }
  }

  /* Force the next section onto a new page */
  .new-page { page-break-before: always; }
</style>

<!-- main content here ... -->

<div class="new-page">
  <h1>Second page</h1>
</div>

<!-- the element rendered into the footer frame on every page -->
<div id="footer_content">
  Page <pdf:pagenumber /> of <pdf:pagecount />
</div>

What xhtml2pdf can and cannot do

Being honest about the limits saves you hours. xhtml2pdf renders a practical subset of HTML and CSS — enough for structured, document-style pages but not for app-style layouts.

Works well: tables, basic typography, colors, borders, background colors, images, absolute positioning inside frames, @page margins and size, repeating headers/footers via @frame, page numbers (<pdf:pagenumber />), page breaks, and embedding custom TrueType fonts with @font-face.

Does not work: modern CSS layout — no Flexbox and no CSS Grid — plus limited support for floats, web fonts loaded over HTTP, and anything that depends on JavaScript (xhtml2pdf has no JS engine at all). If your template leans on Flexbox/Grid, or you need pixel-faithful rendering of a complex branded layout, xhtml2pdf will fight you and WeasyPrint is the better choice.

Because it is pure Python with no native binaries, xhtml2pdf is the easiest of the HTML-to-PDF tools to deploy — a plain pip install works inside slim Docker images and AWS Lambda functions where you cannot install wkhtmltopdf or Pango.

Which Python HTML-to-PDF library should you use?

There is no single best tool — it depends on how complex your documents are and what you can install on the host. Here is how the main options compare in 2026:

Tool Pure Python? CSS support Best for Maintenance
xhtml2pdf (pisa) Yes — pip only, built on ReportLab Limited subset; no Flexbox/Grid Simple templated docs: invoices, receipts, reports Maintained (slow cadence)
WeasyPrint Mostly — needs Pango system libs Excellent modern + Paged Media CSS Complex, styled, branded documents Actively maintained
pdfkit (wkhtmltopdf) No — needs the wkhtmltopdf binary Dated WebKit engine Legacy HTML, existing systems wkhtmltopdf is archived / unmaintained
ReportLab Yes N/A — you draw the layout in code (no HTML) Pixel-precise, data-driven PDFs Actively maintained (OSS + commercial)

How to choose:

  • xhtml2pdf — pick it when your documents are simple and templated and you value a zero-binary, pip-only install (Lambda, slim containers). That is the focus of this article.
  • WeasyPrint — pick it for anything with real styling or complex layout; it has far better CSS support. See generating PDFs from HTML in Django with WeasyPrint.
  • pdfkit / wkhtmltopdf — only for systems that already use it; note that wkhtmltopdf is now archived and unmaintained, so avoid it for new work. Details in creating PDF files in Python with pdfkit.
  • ReportLab — pick it when you would rather build the PDF programmatically than maintain HTML templates (xhtml2pdf already uses it under the hood).

Going the other way — reading data out of existing PDFs and Office files instead of creating them? See extracting text and data from PDF and Microsoft Office files in Python.

Frequently Asked Questions

Should I use xhtml2pdf or WeasyPrint?

Use xhtml2pdf for simple, templated documents (invoices, receipts, basic reports) when you want a pure-Python, pip-only install with no system binaries — ideal for AWS Lambda and slim containers. Use WeasyPrint when your document needs modern or complex CSS: it supports far more, including better layout and Paged Media features, at the cost of requiring the native Pango libraries on the host.

Why don't my images and CSS show up in the PDF?

Because xhtml2pdf reads assets from the local filesystem rather than fetching them over HTTP, so it silently drops any path it cannot resolve. Supply a link_callback to pisa.CreatePDF() that maps your STATIC_URL and MEDIA_URL paths to absolute paths on disk, and in production make sure you have run collectstatic so the files exist. Absolute https:// URLs can be passed through untouched.

How do I return a PDF from a Django view?

Render your template to a string with render_to_string(), create an HttpResponse(content_type="application/pdf"), then call pisa.CreatePDF(src=html, dest=response, link_callback=link_callback) to write the PDF straight into the response. Set the Content-Disposition header to attachment; filename="..." to download it, or inline to display it in the browser.

Does xhtml2pdf support modern CSS like Flexbox and Grid?

No. xhtml2pdf supports only a limited, legacy subset of CSS — there is no Flexbox or CSS Grid, float support is limited, and there is no JavaScript at all. It handles tables, basic typography, colors, images, @page setup, and @frame headers/footers well. For layouts that depend on Flexbox or Grid, use WeasyPrint instead.

How do I add page breaks in xhtml2pdf?

Use the standard CSS page-break-before: always; or page-break-after: always; on an element, or insert the xhtml2pdf-specific <pdf:nextpage /> tag where you want a new page to begin. Page size and margins are set with the CSS @page rule, and repeating headers or footers (including <pdf:pagenumber />) go inside an @frame.

Is xhtml2pdf a good choice for production?

Yes, for the right job. It is stable, well suited to high-volume generation of simple templated documents, and its pure-Python install makes deployment trivial. The caveats are its limited CSS support and a relatively slow release cadence — if your documents are visually complex or need cutting-edge CSS, choose WeasyPrint; for programmatic, pixel-precise output, choose ReportLab.

Wrapping up

xhtml2pdf is the simplest way to turn HTML into PDFs in Python when your documents are templated and your priority is a clean, binary-free deployment: pip install xhtml2pdf, render your template to a string, call pisa.CreatePDF() into a BytesIO buffer or an HttpResponse, and wire up a link_callback so your images and CSS appear. Reach for WeasyPrint when you outgrow its CSS limits, and ReportLab when you would rather build the document in code.

MicroPyramid has delivered 50+ projects involving document generation, reporting, and templating across Python and Django applications over the last 12+ years. If you need a reliable, scalable PDF pipeline, we can help you choose the right engine and build it.

Share this article