org.pdfbox.pdfparser

Class PDFParser


public class PDFParser
extends BaseParser

This class will handle the parsing of the PDF document.
Version:
$Revision: 1.53 $
Author:
Ben Litchfield

Field Summary

Fields inherited from class org.pdfbox.pdfparser.BaseParser

DEF, ENDSTREAM, pdfSource

Constructor Summary

PDFParser(InputStream input)
Constructor.
PDFParser(InputStream input, RandomAccess rafi)
Constructor to allow control over RandomAccessFile.

Method Summary

COSDocument
getDocument()
This will get the document that was parsed. parse() must be called before this is called.
FDFDocument
getFDFDocument()
This will get the FDF document that was parsed.
PDDocument
getPDDocument()
This will get the PD document that was parsed.
void
parse()
This will prase the stream and create the PDF document.
protected PDFXref
parseXrefSection()
This will parse the xref table and trailers from the stream.
protected void
parseXrefTable(int[] params)
This will parse the xref table from the stream.
void
setTempDirectory(File tmpDir)
This is the directory where pdfbox will create a temporary file for storing pdf document stream in.
protected void
skipHeaderFillBytes()
This will skip a header's binary fill bytes.

Methods inherited from class org.pdfbox.pdfparser.BaseParser

addXref, getXrefs, isClosing, isClosing, isEOL, isEOL, isEndOfName, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseDirObject, readExpectedString, readInt, readLine, readString, readString, setDocument, skipSpaces

Constructor Details

PDFParser

public PDFParser(InputStream input)
            throws IOException
Constructor.
Parameters:
input - The input stream that contains the PDF document.

PDFParser

public PDFParser(InputStream input,
                 RandomAccess rafi)
            throws IOException
Constructor to allow control over RandomAccessFile.
Parameters:
input - The input stream that contains the PDF document.
rafi - The RandomAccessFile to be used in internal COSDocument

Method Details

getDocument

public COSDocument getDocument()
            throws IOException
This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.
Returns:
The document that was parsed.

getFDFDocument

public FDFDocument getFDFDocument()
            throws IOException
This will get the FDF document that was parsed. When you are done with this document you must call close() on it to release resources.
Returns:
The document at the PD layer.

getPDDocument

public PDDocument getPDDocument()
            throws IOException
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.
Returns:
The document at the PD layer.

parse

public void parse()
            throws IOException
This will prase the stream and create the PDF document. This will close the stream when it is done parsing.

parseXrefSection

protected PDFXref parseXrefSection()
            throws IOException
This will parse the xref table and trailers from the stream.
Returns:
a new PDFXref

parseXrefTable

protected void parseXrefTable(int[] params)
            throws IOException
This will parse the xref table from the stream. It stores the starting object number and the count
Parameters:
params - The start and count parameters

setTempDirectory

public void setTempDirectory(File tmpDir)
This is the directory where pdfbox will create a temporary file for storing pdf document stream in. By default this directory will be the value of the system property java.io.tmpdir.
Parameters:
tmpDir - The directory to create scratch files needed to store pdf document streams.

skipHeaderFillBytes

protected void skipHeaderFillBytes()
            throws IOException
This will skip a header's binary fill bytes. This is in accordance to PDF Specification 1.5 pg 68 section 3.4.1 "Syntax.File Structure.File Header"