| Path: | README.rdoc |
| Last Update: | Sat Feb 23 07:15:13 +0000 2019 |
ParseTree is great, it accesses the runtime AST (abstract syntax tree) and makes it possible to convert any object to ruby code & S-expression, BUT ParseTree doesn‘t work for 1.9.* & JRuby.
RubyParser is great, and it works for any rubies (of course, not 100% compatible for 1.8.7 & 1.9.* syntax yet), BUT it works only with static code.
I truely enjoy using the above tools, but with my other projects, the absence of ParseTree on the different rubies is forcing me to hand-baked my own solution each time to extract the proc code i need at runtime. This is frustrating, the solution for each of them is never perfect, and i‘m reinventing the wheel each time just to address a particular pattern of usage (using regexp kungfu).
Enough is enough, and now we have Sourcify, a unified solution to extract proc code. When ParseTree is available, it simply works as a thin wrapper round it, otherwise, it uses a home-baked ragel-generated scanner to extract the proc code. Further processing with RubyParser & Ruby2Ruby to ensure 100% with ParseTree (yup, there is no denying that i really like ParseTree).
The religiously standard way:
$ gem install ParseTree sourcify
Or on 1.9.* or JRuby:
$ gem install ruby_parser file-tail sourcify
Sourcify adds 4 methods to Proc:
Returns the code representation of the proc:
require 'sourcify'
lambda { x + y }.to_source
# >> "proc { (x + y) }"
proc { x + y }.to_source
# >> "proc { (x + y) }"
Like it or not, a lambda is represented as a proc when converted to source (exactly the same way as ParseTree). It is possible to only extract the body of the proc by passing in {:strip_enclosure => true}:
lambda { x + y }.to_source(:strip_enclosure => true)
# >> "(x + y)"
lambda {|i| i + 2 }.to_source(:strip_enclosure => true)
# >> "(i + 2)"
Returns the S-expression of the proc:
require 'sourcify'
x = 1
lambda { x + y }.to_sexp
# >> s(:iter,
# >> s(:call, nil, :proc, s(:arglist)),
# >> nil,
# >> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))
To extract only the body of the proc:
lambda { x + y }.to_sexp(:strip_enclosure => true)
# >> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))
Unlike Proc#to_source, which returns code that retains only functional aspects, fetching of raw source returns the raw code enclosed within the proc, including fluff like comments:
lambda do |i|
i+1 # (blah)
end.to_source
# >> "proc do |i|
# >> i+1 # (blah)
# >> end"
NOTE: This is extracting of raw code, it relies on static code scanning (even when running in ParseTree mode), the gotchas for static code scanning always apply.
By default, this is only available on 1.9.*, it is added (as a bonus) to provide consistency under 1.8.*:
# /tmp/test.rb
require 'sourcify'
lambda { x + y }.source_location
# >> ["/tmp/test.rb", 5]
Performance is embarassing for now, benchmarking results for processing 500 procs (in the ObjectSpace of an average rails project) yiels the following:
ruby user system total real ruby-1.8.7-p299 (w ParseTree) 10.270000 0.010000 10.280000 ( 10.311430) ruby-1.8.7-p299 (static scanner) 14.120000 0.080000 14.200000 ( 14.283817) ruby-1.9.1-p376 (static scanner) 17.380000 0.050000 17.430000 ( 17.405966) jruby-1.5.2 (static scanner) 21.318000 0.000000 21.318000 ( 21.318000)
Since i‘m still pretty new to ragel, the code scanner will probably become better & faster as my knowlegde & skills with ragel improve. Also, instead of generating a pure ruby scanner, we can generate native code (eg. C or java, or whatever) instead. As i‘m a C & java noob, this will probably take some time to realize.
Nothing beats ParseTree‘s ability to access the runtime AST, it is a very powerful feature. The scanner-based (static) implementation suffer the following gotchas:
Since static code analysis is involved, the subject code needs to physically exist within a file, meaning Proc#source_location must return the expected *[file, lineno]*, the following will not work:
def test
eval('lambda { x + y }')
end
test.source_location
# >> ["(eval)", 1]
test.to_source
# >> Sourcify::CannotParseEvalCodeError
The same applies to *Blah#to_proc* & *&:blah*:
klass = Class.new do
def aa(&block); block ; end
def bb; 1+2; end
end
klass.new.method(:bb).to_proc.to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError
klass.new.aa(&:bb).to_source
# >> Sourcify::CannotHandleCreatedOnTheFlyProcError
Sometimes, we may have multiple procs on a line, Sourcify can handle this as long as the subject proc has arity that is unique from others:
# Yup, this works as expected :)
b1 = lambda {|a| a+1 }; b2 = lambda { 1+2 }
b2.to_source
# >> proc { (1 + 2) }
# Nope, this won't work :(
b1 = lambda { 1+2 }; b2 = lambda { 2+3 }
b2.to_source
# >> raises Sourcify::MultipleMatchingProcsPerLineError
As observed, the above does not work when there are multiple procs having the same arity, on the same line. Furthermore, this bug under 1.8.* affects the accuracy of this approach.
To better narrow down the scanning, try:
x = lambda { proc { :blah } }
x.to_source
# >> Sourcify::MultipleMatchingProcsPerLineError
x.to_source(:attached_to => :lambda)
# >> "proc { proc { :blah } }"
x = lambda { lambda { :blah } }
x.to_source
# >> Sourcify::MultipleMatchingProcsPerLineError
x.to_source(:ignore_nested => true)
# >> "proc { lambda { :blah } }"
x, y = lambda { def secret; 1; end }, lambda { :blah }
x.to_source
# >> Sourcify::MultipleMatchingProcsPerLineError
x.to_source{|body| body =~ /^(.*\W|)def\W/ }
# >> 'proc { def secret; 1; end }'
Pls refer to the rdoc for more details.
Under the hood, sourcify relies on RubyParser to yield s-expression, and since RubyParser does not yet fully handle 1.8.7 & 1.9.* syntax, you will get a nasty Racc::ParseError when you have any code that is not compatible with 1.8.6.
Sourcify spec suite currently passes in the following rubies:
Besides its own spec suite, sourcify has also been tested to handle:
ObjectSpace.each_object(Proc) {|o| puts o.to_source }
For projects:
(TODO: the more the merrier)
Projects using sourcify include:
Sourcify is heavily inspired by many ideas gathered from the ruby community:
The sad fact that Proc#to_source wouldn‘t be available in the near future:
Copyright (c) 2010 NgTzeYang. See LICENSE for details.