Tabula: Extract data from PDFs

8 November 2013

 

Copy and pasting data from PDFs can be painful. Users are forced to copy one row at a time, or spend hours reformatting large chunks of the file. The process is incredibly time consuming and also risks loosing data.

Tabula, which is especially useful for journalists working closely with data, is an exciting free tool that allows data to be extracted cleanly from PDFs. Users can select a table in its entirety, or even just the data points of interest, and the program will convert it into a plain spreadsheet or text document.

After downloading, data can be extracted in 4 simple steps:

1. Upload the PDF
2. Select the area of interest in the table
3. Decide what format you want to download the data in
4. Check and edit the data selection

 


 

Already Tabula has helped ProPublica extricate information about pharmaceutical payments to medical practitioners for their Dollars for Docs Project, and by the MinnPost to report on crime statistics published in PDFs by the Minneapolis Police Department.

If you would like to start during Tabula, it is available for download on their website.