org.pdfbox.util
Class PDFText2HTML
public class PDFText2HTML
Wrap stripped text in simple HTML, trying to form HTML paragraphs.
Paragraphs broken by pages, columns, or figures are not mended.
- jjb - http://www.johnjbarton.com
endDocument , endPage , endParagraph , flushText , getCharactersByArticle , getCurrentPageNo , getEndBookmark , getEndPage , getLineSeparator , getOutput , getPageSeparator , getStartBookmark , getStartPage , getText , getText , getWordSeparator , processPage , processPages , setEndBookmark , setEndPage , setLineSeparator , setPageSeparator , setShouldSeparateByBeads , setSortByPosition , setStartBookmark , setStartPage , setSuppressDuplicateOverlappingText , setWordSeparator , shouldSeparateByBeads , shouldSortByPosition , shouldSuppressDuplicateOverlappingText , showCharacter , startDocument , startPage , startParagraph , writeCharacters , writeText , writeText |
getColorSpaces , getCurrentPage , getFonts , getGraphicsStack , getGraphicsState , getGraphicsStates , getResources , getTextLineMatrix , getTextMatrix , getXObjects , processOperator , processOperator , processStream , processSubStream , registerOperatorProcessor , resetEngine , setColorSpaces , setFonts , setGraphicsStack , setGraphicsState , setGraphicsStates , setTextLineMatrix , setTextMatrix , showCharacter , showString |
PDFText2HTML
public PDFText2HTML()
throws IOException
Constructor.
endParagraph
protected void endParagraph()
throws IOException
Write out the paragraph separator.
- endParagraph in interface PDFTextStripper
getTitleGuess
protected String getTitleGuess()
The guess to the document title.
- A string that is the title of this document.
guessTitle
protected TextPosition guessTitle(Iterator textIter)
This method will attempt to guess the title of the document.
textIter
- The characters on the first page.
- The text position that is guessed to be the title.
isSuppressParagraphs
public boolean isSuppressParagraphs()
- Returns the suppressParagraphs.
setSuppressParagraphs
public void setSuppressParagraphs(boolean shouldSuppressParagraphs)
shouldSuppressParagraphs
- The suppressParagraphs to set.
writeHeader
protected void writeHeader()
throws IOException
Write the header to the output document.