Class HTML5::HTMLTokenizer
In: lib/html5/tokenizer.rb
Parent: Object

This class takes care of tokenizing HTML.

  • @current_token Holds the token that is currently being processed.
  • @state Holds a reference to the method to be invoked… XXX
  • @states Holds a mapping between states and methods that implement the state.
  • @stream Points to HTMLInputStream object.

Methods

Attributes

content_model_flag  [RW] 
current_token  [RW] 
stream  [R] 

Public Class methods

XXX need to fix documentation

Public Instance methods

This function returns either U+FFFD or the character based on the decimal or hexadecimal representation. It also discards ";" if present. If not present @token_queue << {:type => :ParseError}" is invoked.

XXX AT Perhaps we should have Hixie run some evaluation on billions of documents to figure out what the order of the various if and elsif statements should be.

This is where the magic happens.

We do our usually processing through the states and when we have a token to return we yield the token which pauses processing until the next token is requested.

This method is a generic handler for emitting the tags. It also sets the state to "data" because that‘s what‘s needed after a token has been emitted.

This method replaces the need for "entityInAttributeValueState".

If the next character is a ’>’, convert the current_token into an EmptyTag

[Validate]