ruby on rails - Nokogiri parsing missing element create issue -


i having plain html doc no css . in of content need pass excel sheet. tried nokogiri works on css basis.

do tried thing.

<html>  <head></head>   <body>     ***note***    <br>       items     <br>    <br>       invoice number : [78945824] po number : [4587958]    <br>        track : <a href="abc.com"> 12345</a>    <br>    <br>       items     <br>    <br>       invoice number : [79546828] po number : [4567892]    <br>     <br>    <br>       items     <br>    <br>       invoice number : [78976824] po number : [897569]    <br>       track : <a href="abc.com"> 12345</a>    <br>    </body>    </html> 

i able retrieve po number & tracking no

  require 'rubygems' require 'nokogiri'    require 'open-uri'  page_url = "a.html"  page = nokogiri::html(open(page_url))     data = page.css("body").text      po_numbers = data.scan(/invoice number : \[\d+\] po number : \[(\d+)\]/).flatten     tracking_numbers = page.css("a").text.split      [["po number", "tracking number"]].concat(po_numbers.zip(tracking_numbers))  puts po_numbers  puts tracking_numbers   => po_numbers = ["4587958", "4567892", "4587958"] => tracking_numbers = ["12543", "12356"] 

when zip together, get:

=> po_numbers.zip(tracking_numbers) => [["4587958", "12543"], ["4567892", "12356"], ["4587958", "nil"]]  want is:  => [["4587958", "12543"], ["4567892", "nil"], ["4587958", "12356"] ] 

try this

data = page.css("body").text data = data.gsub(" ","").split(/\n/) po=[] track=[] data.each |i|   if i.include? "ponumber"     po << i.split("ponumber:").last.scan(/\d+/)[0]   end   if i.include? "trackit"     track << i.split("trackit:").last   end end po.zip(track) 

Comments