org.pdfbox.pdfparser
Class PDFParser
This class will handle the parsing of the PDF document.
addXref , getXrefs , isClosing , isClosing , isEOL , isEOL , isEndOfName , isWhitespace , isWhitespace , parseBoolean , parseCOSArray , parseCOSDictionary , parseCOSName , parseCOSStream , parseCOSString , parseDirObject , readExpectedString , readInt , readLine , readString , readString , setDocument , skipSpaces |
PDFParser
public PDFParser(InputStream input)
throws IOException
Constructor.
input
- The input stream that contains the PDF document.
PDFParser
public PDFParser(InputStream input,
RandomAccess rafi)
throws IOException
Constructor to allow control over RandomAccessFile.
input
- The input stream that contains the PDF document.rafi
- The RandomAccessFile to be used in internal COSDocument
getDocument
public COSDocument getDocument()
throws IOException
This will get the document that was parsed. parse() must be called before this is called.
When you are done with this document you must call close() on it to release
resources.
- The document that was parsed.
getFDFDocument
public FDFDocument getFDFDocument()
throws IOException
This will get the FDF document that was parsed. When you are done with
this document you must call close() on it to release resources.
- The document at the PD layer.
getPDDocument
public PDDocument getPDDocument()
throws IOException
This will get the PD document that was parsed. When you are done with
this document you must call close() on it to release resources.
- The document at the PD layer.
parse
public void parse()
throws IOException
This will prase the stream and create the PDF document. This will close
the stream when it is done parsing.
parseXrefSection
protected PDFXref parseXrefSection()
throws IOException
This will parse the xref table and trailers from the stream.
parseXrefTable
protected void parseXrefTable(int[] params)
throws IOException
This will parse the xref table from the stream.
It stores the starting object number and the count
params
- The start and count parameters
setTempDirectory
public void setTempDirectory(File tmpDir)
This is the directory where pdfbox will create a temporary file
for storing pdf document stream in. By default this directory will
be the value of the system property java.io.tmpdir.
tmpDir
- The directory to create scratch files needed to store
pdf document streams.
skipHeaderFillBytes
protected void skipHeaderFillBytes()
throws IOException
This will skip a header's binary fill bytes. This is in accordance to
PDF Specification 1.5 pg 68 section 3.4.1 "Syntax.File Structure.File Header"