Package org.htmlunit.cyberneko
Class HTMLScanner.ContentScanner
- java.lang.Object
-
- org.htmlunit.cyberneko.HTMLScanner.ContentScanner
-
- All Implemented Interfaces:
HTMLScanner.Scanner
- Enclosing class:
- HTMLScanner
public class HTMLScanner.ContentScanner extends java.lang.Object implements HTMLScanner.Scanner
The primary HTML document scanner.
-
-
Field Summary
Fields Modifier and Type Field Description private XMLAttributesImplattributes_Attributes.private QNameqName_A qualified name.private java.lang.StringscanStartElement_
-
Constructor Summary
Constructors Constructor Description ContentScanner()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private booleanchangeEncoding(java.lang.String charset)Tries to change the encoding used to read the input stream to the specified oneprivate voideof()private java.lang.StringremoveSpaces(java.lang.String content)Removes all whitespaces from the stringintscan(boolean complete)Scan.protected intscanAttribute(XMLAttributesImpl attributes, boolean[] empty)Scans a real attribute.protected intscanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes)protected intscanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue)protected intscanCDATA()protected intscanCDataContent(XMLString xmlString)protected voidscanCharacters()protected intscanComment()protected intscanCommentContent(XMLString buffer)protected voidscanEndElement()protected intscanPI()protected intscanStartElement(boolean[] empty)Scans a start element.private voidscanUntilEndTag(java.lang.String tagName)Scans the content of <noscript&gr;: it doesn't get parsed but is considered as plain text when featureHTMLScanner.PARSE_NOSCRIPT_CONTENTis set to false.
-
-
-
Field Detail
-
qName_
private final QName qName_
A qualified name.
-
attributes_
private final XMLAttributesImpl attributes_
Attributes.
-
scanStartElement_
private java.lang.String scanStartElement_
-
-
Method Detail
-
scan
public int scan(boolean complete) throws java.io.IOExceptionScan.- Specified by:
scanin interfaceHTMLScanner.Scanner- Parameters:
complete- True if the scanner should not return until scanning is complete.- Returns:
- True if additional scanning is required.
- Throws:
java.io.IOException- Thrown if I/O error occurs.
-
eof
private void eof()
-
scanUntilEndTag
private void scanUntilEndTag(java.lang.String tagName) throws java.io.IOExceptionScans the content of <noscript&gr;: it doesn't get parsed but is considered as plain text when featureHTMLScanner.PARSE_NOSCRIPT_CONTENTis set to false.- Parameters:
tagName- the tag for which content is scanned (one of "noscript", "noframes", "iframe")- Throws:
java.io.IOException- on error
-
scanCharacters
protected void scanCharacters() throws java.io.IOException- Throws:
java.io.IOException
-
scanCDATA
protected int scanCDATA() throws java.io.IOException- Throws:
java.io.IOException
-
scanComment
protected int scanComment() throws java.io.IOException- Throws:
java.io.IOException
-
scanCommentContent
protected int scanCommentContent(XMLString buffer) throws java.io.IOException
- Throws:
java.io.IOException
-
scanCDataContent
protected int scanCDataContent(XMLString xmlString) throws java.io.IOException
- Throws:
java.io.IOException
-
scanPI
protected int scanPI() throws java.io.IOException- Throws:
java.io.IOException
-
scanStartElement
protected int scanStartElement(boolean[] empty) throws java.io.IOExceptionScans a start element.- Parameters:
empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- ename
- Throws:
java.io.IOException- in case of io problems
-
removeSpaces
private java.lang.String removeSpaces(java.lang.String content)
Removes all whitespaces from the string
-
changeEncoding
private boolean changeEncoding(java.lang.String charset)
Tries to change the encoding used to read the input stream to the specified one- Parameters:
charset- the charset that should be used- Returns:
truewhen the encoding has been changed
-
scanAttribute
protected int scanAttribute(XMLAttributesImpl attributes, boolean[] empty) throws java.io.IOException
Scans a real attribute.- Parameters:
attributes- The list of attributes.empty- Is used for a second return value to indicate whether the start element tag is empty (e.g. "/>").- Returns:
- success
- Throws:
java.io.IOException- in case of io problems
-
scanAttributeUnquotedValue
protected int scanAttributeUnquotedValue(HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue) throws java.io.IOException
- Throws:
java.io.IOException
-
scanAttributeQuotedValue
protected int scanAttributeQuotedValue(int currentQuote, HTMLScanner.CurrentEntity currentEntity, XMLString attribValue, XMLString plainAttribValue, boolean normalizeAttributes) throws java.io.IOException- Throws:
java.io.IOException
-
scanEndElement
protected void scanEndElement() throws java.io.IOException- Throws:
java.io.IOException
-
-