org.pdfbox.util

Class PDFTextStripperByArea


public class PDFTextStripperByArea
extends PDFTextStripper

This will extract text from a specified region in the PDF.
Version:
$Revision: 1.5 $
Author:
Ben Litchfield

Field Summary

Fields inherited from class org.pdfbox.util.PDFTextStripper

charactersByArticle, output

Constructor Summary

PDFTextStripperByArea()
Constructor.

Method Summary

void
addRegion(String regionName, Rectangle2D rect)
Add a new region to group text by.
void
extractRegions(PDPage page)
Process the page to extract the region text.
protected void
flushText()
This will print the text to the output stream.
List
getRegions()
Get the list of regions that have been setup.
String
getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().
protected void
showCharacter(TextPosition text)

Methods inherited from class org.pdfbox.util.PDFTextStripper

endDocument, endPage, endParagraph, flushText, getCharactersByArticle, getCurrentPageNo, getEndBookmark, getEndPage, getLineSeparator, getOutput, getPageSeparator, getStartBookmark, getStartPage, getText, getText, getWordSeparator, processPage, processPages, setEndBookmark, setEndPage, setLineSeparator, setPageSeparator, setShouldSeparateByBeads, setSortByPosition, setStartBookmark, setStartPage, setSuppressDuplicateOverlappingText, setWordSeparator, shouldSeparateByBeads, shouldSortByPosition, shouldSuppressDuplicateOverlappingText, showCharacter, startDocument, startPage, startParagraph, writeCharacters, writeText, writeText

Methods inherited from class org.pdfbox.util.PDFStreamEngine

getColorSpaces, getCurrentPage, getFonts, getGraphicsStack, getGraphicsState, getGraphicsStates, getResources, getTextLineMatrix, getTextMatrix, getXObjects, processOperator, processOperator, processStream, processSubStream, registerOperatorProcessor, resetEngine, setColorSpaces, setFonts, setGraphicsStack, setGraphicsState, setGraphicsStates, setTextLineMatrix, setTextMatrix, showCharacter, showString

Constructor Details

PDFTextStripperByArea

public PDFTextStripperByArea()
            throws IOException
Constructor.

Method Details

addRegion

public void addRegion(String regionName,
                      Rectangle2D rect)
Add a new region to group text by.
Parameters:
regionName - The name of the region.
rect - The rectangle area to retrieve the text from.

extractRegions

public void extractRegions(PDPage page)
            throws IOException
Process the page to extract the region text.
Parameters:
page - The page to extract the regions from.

flushText

protected void flushText()
            throws IOException
This will print the text to the output stream.
Overrides:
flushText in interface PDFTextStripper

getRegions

public List getRegions()
Get the list of regions that have been setup.
Returns:
A list of java.lang.String objects to identify the region names.

getTextForRegion

public String getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().
Parameters:
regionName - The name of the region to get the text from.
Returns:
The text that was identified in that region.

showCharacter

protected void showCharacter(TextPosition text)
Overrides:
showCharacter in interface PDFTextStripper