org.pdfbox.util
Class PDFTextStripperByArea
public class PDFTextStripperByArea
This will extract text from a specified region in the PDF.
void | addRegion(String regionName, Rectangle2D rect) - Add a new region to group text by.
|
void | extractRegions(PDPage page) - Process the page to extract the region text.
|
protected void | flushText() - This will print the text to the output stream.
|
List | getRegions() - Get the list of regions that have been setup.
|
String | getTextForRegion(String regionName) - Get the text for the region, this should be called after extractRegions().
|
protected void | showCharacter(TextPosition text) -
|
endDocument , endPage , endParagraph , flushText , getCharactersByArticle , getCurrentPageNo , getEndBookmark , getEndPage , getLineSeparator , getOutput , getPageSeparator , getStartBookmark , getStartPage , getText , getText , getWordSeparator , processPage , processPages , setEndBookmark , setEndPage , setLineSeparator , setPageSeparator , setShouldSeparateByBeads , setSortByPosition , setStartBookmark , setStartPage , setSuppressDuplicateOverlappingText , setWordSeparator , shouldSeparateByBeads , shouldSortByPosition , shouldSuppressDuplicateOverlappingText , showCharacter , startDocument , startPage , startParagraph , writeCharacters , writeText , writeText |
getColorSpaces , getCurrentPage , getFonts , getGraphicsStack , getGraphicsState , getGraphicsStates , getResources , getTextLineMatrix , getTextMatrix , getXObjects , processOperator , processOperator , processStream , processSubStream , registerOperatorProcessor , resetEngine , setColorSpaces , setFonts , setGraphicsStack , setGraphicsState , setGraphicsStates , setTextLineMatrix , setTextMatrix , showCharacter , showString |
PDFTextStripperByArea
public PDFTextStripperByArea()
throws IOException
Constructor.
addRegion
public void addRegion(String regionName,
Rectangle2D rect)
Add a new region to group text by.
regionName
- The name of the region.rect
- The rectangle area to retrieve the text from.
extractRegions
public void extractRegions(PDPage page)
throws IOException
Process the page to extract the region text.
page
- The page to extract the regions from.
flushText
protected void flushText()
throws IOException
This will print the text to the output stream.
- flushText in interface PDFTextStripper
getRegions
public List getRegions()
Get the list of regions that have been setup.
- A list of java.lang.String objects to identify the region names.
getTextForRegion
public String getTextForRegion(String regionName)
Get the text for the region, this should be called after extractRegions().
regionName
- The name of the region to get the text from.
- The text that was identified in that region.