i having plain html doc no css . in of content need pass excel sheet. tried nokogiri works on css basis.
do tried thing.
<html> <head></head> <body> ***note*** <br> items <br> <br> invoice number : [78945824] po number : [4587958] <br> track : <a href="abc.com"> 12345</a> <br> <br> items <br> <br> invoice number : [79546828] po number : [4567892] <br> <br> <br> items <br> <br> invoice number : [78976824] po number : [897569] <br> track : <a href="abc.com"> 12345</a> <br> </body> </html>
i able retrieve po number & tracking no
require 'rubygems' require 'nokogiri' require 'open-uri' page_url = "a.html" page = nokogiri::html(open(page_url)) data = page.css("body").text po_numbers = data.scan(/invoice number : \[\d+\] po number : \[(\d+)\]/).flatten tracking_numbers = page.css("a").text.split [["po number", "tracking number"]].concat(po_numbers.zip(tracking_numbers)) puts po_numbers puts tracking_numbers => po_numbers = ["4587958", "4567892", "4587958"] => tracking_numbers = ["12543", "12356"]
when zip together, get:
=> po_numbers.zip(tracking_numbers) => [["4587958", "12543"], ["4567892", "12356"], ["4587958", "nil"]] want is: => [["4587958", "12543"], ["4567892", "nil"], ["4587958", "12356"] ]
try this
data = page.css("body").text data = data.gsub(" ","").split(/\n/) po=[] track=[] data.each |i| if i.include? "ponumber" po << i.split("ponumber:").last.scan(/\d+/)[0] end if i.include? "trackit" track << i.split("trackit:").last end end po.zip(track)
Comments
Post a Comment