Batch convert web pages to PDF

With the batch API you can easily convert a whole set of web pages into a single PDF as well as into an archive or Zip file with individual PDFs. This allows you to convert your entire website into PDF or to take different webpages from different sources and convert them into one PDF. You can use it to create customized PDF backups of your complete site or to create high-quality reports, e-books, brochures and much more!
We also offer a manual way of doing all this in our members area. Read more about that here!

Features

  • Convert an entire web site to a single PDF as well as into individual PDF documents per web page.
  • Batch process a set of URLs or web pages for conversion and get an archive or Zip file with PDFs back.
  • Very fast conversion times and easy monitoring of progress.
  • Control options for PDF layout, headers and footers and much more!
  • Each PDF in the batch can have different layout and other options!
  • Convert or exclude parts of a webpage
  • No installation required!

Basic Usage

Our batch Web to PDF API takes a license and either a set of URLs or a sitemap as input and then schedules a job on one of our servers. As soon as the job is finished you will be notified by email. You can also use other options to monitor the progress yourself of course.

All you need to do is just send a request similar to this, which converts the pages of http://www.example.com (indicated by its sitemap) to one large PDF:

https://pdfmyurl.com/batch_api?license=yourlicense&sitemap=http://www.example.com/sitemap.xml

Or by using the urls[] array like this:

https://pdfmyurl.com/batch_api?license=yourlicense&urls[]=http://www.example.com/page1.html&urls[]=http://www.example.com/page2.html

Each request should at least have the following components:

  1. endpoint: this should be http://pdfmyurl.com/batch_api or https://pdfmyurl.com/batch_api for secure access
  2. license: this is mandatory and you will get yours when you sign up
  3. urls[]: an array with web pages that you want to convert OR
  4. sitemap: the URL of your sitemap in XML format, which you can create online here if you don't have any yet!

Additionally the following optional parameters determine the way the batch is treated:

  • merge: set this to true if you want a single PDF consisting of all your conversions OR set it to false if you only want an archive or Zip file with a PDF for each single conversion. You will always get the archive/Zip file and by default we will assume you also want a single large PDF of all pages combined.
  • include_toc: set this to true if you want a table of contents at the start of your PDF. This only works if you set the merge parameter to true.
  • archive_type: set this to zip if you want a Zip file or tar if you want a gzipped Tar archive. As a default you will get the gzipped Tar archive.
  • archive_filenaming: use this to control the filenames in the archive. You can use the following masks in the filename:
    • %url - the URL of the webpage
    • %title - the title of the webpage
    • %YYYY, %MM, %DD, %hh, %mm, %ss - year, month, day, hour, minute and second
    • So if you set this to %url.pdf you will get an archive of PDFs with the name based on the URL. If you don't use this parameter then all your PDFs will be numbered consecutively.
  • dropbox: set this to true if you want to have the result saved to your Dropbox account. This only works if you have authorized our service to your Dropbox account after you bought the upgrade to the advanced website conversion page.

NB: Each URL in the batch counts as one conversion and by default we allow a maximum of 500 conversions per batch. Please contact us if you need more conversions per batch.

Setting Options

All the options that you set in our members area are used as defaults for the batch API as well. You can also set all options that we support for the regular HTML to PDF API. This includes options for page layout, headers and footers, but also encryption settings, watermarking options and custom backgrounds. You can set these options as default for all the conversions in the batch and then override them for each conversion in the batch.

Example: If you want to set the default orientation of the conversions to A4 format and portrait orientation, you would use the following request.

https://pdfmyurl.com/batch_api?license=yourlicense&sitemap=http://www.example.com/sitemap.xml&page_size=A4&orientation=portrait

The table below some examples of the settings that you can change to control the conversions. For a full overview of options please refer to the documentation of the HTML to PDF API.

ParameterDescription
page_sizePage size, such as A4, Letter etc. See the full list for details.
orientationPortrait or Landscape orientation
contentControls which content of the page you want to convert or exclude. See part of page conversion for more info.
css_media_typeUse print for the print friendly layout (CSS media type 'print') if your web page has one
headerHTML that you want to use as header
footerHTML that you want to use as footer

The header and footer can be specified in HTML and we also support extra parameters for dates and page numbers. Have a look at the extensive documentation.

Overriding options

The options that you have used above are defaults that will be taken for all conversions in the batch. In a lot of cases you may want to have different options for different web pages within the batch and then you can override these options on a case by case basis.

You might want to have all pages in your final PDF in landscape, but the page(s) of the first conversion in portrait. Using the 'exceptions' parameter will allow you to do so.

ExampleExplanation
exceptions[1][orientation]=portraitSets the orientation of the 1st PDF in the batch to portrait.
exceptions[2][page_size]=A3Sets the page size of the 2nd PDF in the batch to A3
exceptions[12][width]=210Sets the width of the 12th PDF to 210mm (mm is default if nothing is specified).
This will create this 12th PDF as a single long one page PDF.

Including a table of contents

If you have chosen to include a table of contents for your PDF then you can define the layout in case you bought the upgrade to the advanced website conversion page. The documentation explains how you can use HTML and special shortcodes to get the table of contents you want.

These are the settings you can use.

ParameterDescription
tocControls the contents and layout of the table of contents page(s).
toc_headerSets the header of the table of contents page.
toc_footerSets the footer of the table of contents page.

Return codes

Our batch API returns HTTP response codes, which you can check to see if the conversion was successful or not. The following is the list of return codes we use.

CodeDescription
200 OKYour conversion was processed succesfully
400 Bad RequestYou didn't specify anything to convert or your data was malformed
401 Authorization requiredYou specified an invalid license key
429 Too Many RequestsYou have overrun a usage limit for your plan
503 Service unavailableYou are sending multiple requests at the same time to the API from the same IP address

We also return output on the job details. By default we return these in JSON format and you can use the output parameter to get it in different format (json = JSON, text = text, html = html or none = no output).

Typically you'll get something like this as a result in case of a successful job creation:

array (
  'job' => 5110,
  'priority' => '1',
  'merge' => true,
  'conversions' => 16,
  'urls' => 
  array (
    1 => 'http://www.example.com/',
    2 => 'http://www.example.com/developer',
    3 => 'http://www.example.com/payment-methods-and-security',
    4 => 'http://www.example.com/customer-service',
    5 => 'http://www.example.com/customization',
    6 => 'http://www.example.com/features-overview',
    7 => 'http://www.example.com/contact-us',
    8 => 'http://www.example.com/localization-global-payments',
    9 => 'http://www.example.com/clients',
    10 => 'http://www.example.com/subscription',
    11 => 'http://www.example.com/in-app-purchases',
    12 => 'http://www.example.com/purchases',
    13 => 'http://www.example.com/payments',
    14 => 'http://www.example.com/relationship-rights-management',
    15 => 'http://www.example.com/faq',
    16 => 'http://www.example.com/optimization',
  ),
)

Progress monitoring

You can get one or more notifications when your job completes at our server or you can track the progress yourself with the batch monitor API (see below). If you just want to get notified upon completion, then you use the following parameters in the API call:

  1. email: we'll send a job completion notification to the address you specify
  2. callback: we'll POST the job results to the URL you specify here

Callbacks

If you added a callback URL to the API call then we will POST the following data to the callback URL upon completion of the job. You will be able to use these to download the results.

  • job: the job number
  • success: true when the job completed successfully or false when something went wrong
  • conversions: the number of conversions that were executed
  • download: the URL where you can download your resulting PDF or archive
  • archive: the URL where you can download your archive or ZIP file

Batch monitor API

You can also track the status and progress of your batch jobs via a separate HTTP request. This request by default will give you a JSON response indicating job details and status.

Just send a request similar to:

https://pdfmyurl.com/batch_monitor?license=yourlicensekey

The request can have the following components:

  1. endpoint: this should be http://pdfmyurl.com/batch_monitor or https://pdfmyurl.com/batch_monitor for secure access
  2. license: this is mandatory and you will get yours when you sign up
  3. job: this is optional and will limit the output to only this job.
  4. output: json or text output. This parameter is optional and by default we assume you want a JSON response.

The output that you will receive will look something like this. Please be aware that we may add fields to the structure in the future!

array (
  219 => 
  array (
    'submitted' => '2015-05-07 18:10:17',
    'finished' => '2015-05-07 19:54:00',
    'status' => 'completed',
    'success' => true,
    'conversions' => '27',
    'pages' => '27',
    'download' => 'https://pdfmyurl.com/download?id=51223c91336af3',
    'archive' => 'https://pdfmyurl.com/download?id=51223c91336af3&type=zipfile',
  ),
  220 => 
  array (
    'submitted' => '2015-05-07 18:44:50',
    'status' => 'processing',
    'conversions' => '33 out of 48',
  ),
  221 => 
  array (
    'submitted' => '2015-05-07 18:45:59',
    'status' => 'waiting',
    'conversions' => '0 out of 27',
  ),
)

Note that the 'status' field will indicate if your job is 'waiting' / 'processing' or 'completed' and the 'success' field will indicate true when everything went well or false when something went wrong.

We kindly request that you do NOT send non-stop requests to the monitoring API. The data is only updated once per second and we therefore ask you to only send a request every few seconds or so.