Package com.itextpdf.text.pdf.parser
Class TaggedPdfReaderTool
java.lang.Object
com.itextpdf.text.pdf.parser.TaggedPdfReaderTool
- Direct Known Subclasses:
CompareTool.CmpTaggedPdfReaderTool
Converts a tagged PDF document into an XML file.
- Since:
- 5.0.2
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected PrintWriter
The writer object to which the XML will be writtenprotected PdfReader
The reader object from which the content streams are read. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
convertToXml
(PdfReader reader, OutputStream os) Parses a string with structured content.void
convertToXml
(PdfReader reader, OutputStream os, String charset) Parses a string with structured content.private static String
fixTagName
(String tag) void
Inspects a child of a structured element.void
If the child of a structured element is an array, we need to loop over the elements.void
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.void
inspectChildDictionary
(PdfDictionary k, boolean inspectAttributes) If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.void
parseTag
(String tag, PdfObject object, PdfDictionary page) Searches for a tag in a page.protected String
-
Field Details
-
reader
The reader object from which the content streams are read. -
out
The writer object to which the XML will be written
-
-
Constructor Details
-
TaggedPdfReaderTool
public TaggedPdfReaderTool()
-
-
Method Details
-
convertToXml
Parses a string with structured content.- Parameters:
reader
- the PdfReader that has access to the PDF fileos
- the OutputStream to which the resulting xml will be writtencharset
- the charset to encode the data- Throws:
IOException
- Since:
- 5.0.5
-
convertToXml
Parses a string with structured content. The output is done using the current charset.- Parameters:
reader
- the PdfReader that has access to the PDF fileos
- the OutputStream to which the resulting xml will be written- Throws:
IOException
-
inspectChild
Inspects a child of a structured element. This can be an array or a dictionary.- Parameters:
k
- the child to inspect- Throws:
IOException
-
inspectChildArray
If the child of a structured element is an array, we need to loop over the elements.- Parameters:
k
- the child array to inspect- Throws:
IOException
-
inspectChildDictionary
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k
- the child dictionary to inspect- Throws:
IOException
-
inspectChildDictionary
If the child of a structured element is a dictionary, we inspect the child; we may also draw a tag.- Parameters:
k
- the child dictionary to inspect- Throws:
IOException
-
xmlName
-
fixTagName
-
parseTag
Searches for a tag in a page.- Parameters:
tag
- the name of the tagobject
- an identifier to find the marked contentpage
- a page dictionary- Throws:
IOException
-