i have text file reads this:-
this recipe can made either stand mixer, or hand bowl,
wooden spoon, , strong arms. if use salted butter, please omit
added salt in recipe.
yum
ingredients
1 1/4 cups all-purpose flour (160 g)
1/4 teaspoon salt
1/2 teaspoon baking powder
1/2 cup unsalted butter (1 stick, or 8 tbsp, or 112g) @ room temperature
1/2 cup white sugar (90 g)
1/2 cup dark brown sugar, packed (85 g)
1 large egg
1 teaspoon vanilla extract
1/2 teaspoon instant coffee granules or instant espresso powder
1/2 cup chopped macadamia nuts (3 1/2 ounces, or 100 g)
1/2 cup white chocolate chips
method
1 preheat oven 350°f (175°c). vigorously whisk flour,
, baking powder in bowl , set aside.
i want extract data between words ingredients , method.
have written regex (?s)(?<=\bingredients\b).*?(?=\bmethod\b)
extract data , it's working fine.
when try using spark-shell following, doesn't give me
anything.
val b = sc.textfile("/home/akshat/file.txt") val regex = "(?s)(?<=\bingredients\b).*?(?=\bmethod\b)".r regex.findallin(b).foreach(println)
please tell me going wrong , steps should take to
correct this?
in advance!
what need is
- read file using wholetextfiles (so not break lines , read entire data together)
- write function takes string , outputs string using regex so, may (in python)
blockquote
def getwhatineed(s): output = <my regexp> return output b = sc.wholetextfiles(...) c = b.map(getwhatineed)
now, c rdd. need collect before print it. output of collect normal array/list
print c.collect()
Comments
Post a Comment