14/01/2010
I wrote simple dfs in ruby:
def dfs(node, value, queue)
return false if node.nil?
return true if node.data == value
queue.push node.right unless node.right.nil?
queue.push node.left unless node.left.nil?
dfs(queue.pop, value, queue)
end
for node:
class Node
attr_accessor :left, :right, :data
end
13/01/2010
This was covered multiple times already. I’ve created this little snippet to remember the difference between different types of variables in ruby:
class A
@@foo = "class variable of the class A"
@foo = "class instance variable of the class A"
def instance_method
@foo = "instance variable of the class A"
end
def self.class_method1
# class variables are visible to and shared by the instance and class methods
@@foo
end
def self.class_method2
# class instance variables are visible to and shared by the class methods
@foo
end
end
p A.new.instance_method # instance variable of the class A
p A.class_method1 # class variable of the class A
p A.class_method2 # class instance variable of the class A
class B < A
@@foo = "class variable of the class B"
@foo = "class instance variable of the class B"
end
p B.class_method1 # class variable in B
# class variable in A is overwritten by one in B !!!
p A.class_method1 # class variable in B
p B.class_method2 # class instance variable of the class B
# class instance variable in A is NOT overwritten by one in B !!!
p A.class_method2 # class instance variable of the class A
11/01/2010
Anemone is a pretty cool DSL used for web crawling. I used it with Hpricot to get a feeling for what’s possible. Below is a simple example which crawls and scrappes data from a popular polish real estate website otodom:
require 'rubygems'
require 'sanitize'
require 'anemone'
require 'open-uri'
require 'hpricot'
#otodom.pl
Anemone.crawl("http://otodom.pl/index.php?mod=search&act=searchResults&qid=46911208",
{:storage => Anemone::Storage.PStore("crawl1.pstore")}) do | anemone |
# filter out useless pages
anemone.focus_crawl do |page|
page.links.delete_if do |x|
(x.to_s =~ /mod=search&act=searchResults&qid=/).nil? and
(x.to_s =~ /[a-zA-Z]+-id[0-9]*\.html$/).nil?
end
end
# process details pages
anemone.on_pages_like(/[a-zA-Z]+-id[0-9]*\.html$/) do | page |
doc = Hpricot(page.doc)
price = doc.at("//strong[@id='offerPrice']")
location = doc.at("//dl[@class='stripeMe'] > dd")
desc = doc.at("//div[@id='offerDesc'] > p")
offer_no = doc.at("//div[@id='offerFoot'] p[@class='toLeft']/span/strong")
created_at = doc.at("//div[@id='offerFoot'] p[@class='toRight']/span/strong")
photos = doc.search("//div[@id='imageList']/p/a")
end
end