scala - Regex in Apache Spark -


i have text file reads this:-

this recipe can made either stand mixer, or hand bowl,
wooden spoon, , strong arms. if use salted butter, please omit
added salt in recipe.
yum
ingredients
1 1/4 cups all-purpose flour (160 g)
1/4 teaspoon salt
1/2 teaspoon baking powder
1/2 cup unsalted butter (1 stick, or 8 tbsp, or 112g) @ room temperature
1/2 cup white sugar (90 g)
1/2 cup dark brown sugar, packed (85 g)
1 large egg
1 teaspoon vanilla extract
1/2 teaspoon instant coffee granules or instant espresso powder
1/2 cup chopped macadamia nuts (3 1/2 ounces, or 100 g)
1/2 cup white chocolate chips
method
1 preheat oven 350°f (175°c). vigorously whisk flour,
, baking powder in bowl , set aside.

i want extract data between words ingredients , method.
have written regex (?s)(?<=\bingredients\b).*?(?=\bmethod\b)
extract data , it's working fine.
when try using spark-shell following, doesn't give me
anything.

val b = sc.textfile("/home/akshat/file.txt") val regex = "(?s)(?<=\bingredients\b).*?(?=\bmethod\b)".r regex.findallin(b).foreach(println) 

please tell me going wrong , steps should take to
correct this?
in advance!

what need is

  1. read file using wholetextfiles (so not break lines , read entire data together)
  2. write function takes string , outputs string using regex so, may (in python)

blockquote

def getwhatineed(s):     output = <my regexp>     return output  b = sc.wholetextfiles(...) c = b.map(getwhatineed) 

now, c rdd. need collect before print it. output of collect normal array/list

print c.collect() 

Comments