-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Hi @techhat , following on our discussion in an earlier post #5 , I've given some more thought to the format and have experimented with different structures. I think I've arrived at one which might be worth considering. Here I thought I would just share some ideas using a concrete example of a recipe taken from this website.
An example YAML file that encodes this recipe, with some modifications from the original ORF:
# name of the dish
dish_name: MUSTARD GREEN PORK RICE
# Name of the dish's originator
creator: Cynthia Lim
# How many persons this recipe can feed
serves:
min: 6
max: 8 # If there is no range in the number of persons, just fill in the max field
# Information about the recipe, such as its uses, occasions where it is served, and any historical notes
info:
- It is my all-time favourite comfort food
# The most specific date the recipe was created, if known, else leave blank
creation_date:
# Category of the dish's cuisine, and associated cultural and geographical information
category:
race: CHINESE
ethnic_group:
geography: SINGAPORE
# Information source where this recipe was obtained from, be it a website or a cookbook
source_info:
type: website
# Either website or book name is provided, one is filled, the other is left empty
url: http://mysingaporefood.com/recipe/mustard-green-pork-rice/
book_name:
# Ingredients used in the recipe
ingredients:
RICE:
amount: 2
units: CUP
notes:
- Uncooked raw rice
MUSTARD GREENS:
amount: 2
unit: BUNCH
notes:
- Washed and cut into bite-size
GARLIC:
amount: 8
unit: CLOVE
notes:
- Mince the garlic
GINGER:
amount: 0.25
unit: PIECE
notes:
- Sliced thinly
DRIED SHRIMPS:
amount: 30
unit: GRAM
notes:
- Soaked in hot water, then drained
PORK BELLY:
amount: 1
unit: SLAB
notes:
- Marinated with 2 tablespoon of soya sauce and 1 teaspoon sesame oil
- Sliced into bite-size
DRIED MUSHROOMS:
amount: 30
unit: GRAM
notes:
- Soaked in hot water to soften, then drained
SOYA SAUCE LIGHT:
amount: 3
unit: TABLESPOON
notes:
- 2 tbsp used for seasoning pork belly, 1 tbsp used to drizzle on top of cooked rice
SESAME OIL:
amount: 2
unit: TEASPOON
notes:
- 1 tsp used for sauteing, 1 tsp used to drizzle on top of cooked rice
# Steps to cook the dish
steps:
1: Wash the rice twice and drain well.
2: Heat up wok. Add sesame oil.
3: Saute ginger till fragrant.
4: Add garlic and saute.
5: Add dried shrimps and saute.
6: Add mushrooms and saute till fragrant.
7: Add pork belly and stir-fry till pork belly is half cooked.
8: Add in rice.
9: Add in soya sauce and stir well.
10: Add in mustard greens. Stir well.
11: Scoop mixture into rice cooker and cover with sufficient water.
12: Cook as per instructions on rice cooker.
13: Ready and serve.
Modifications and their rationale:
- Make the ingredients in the recipe as accessible at the top level as possible. To this goal, I've stopped making the ingredient names be marked with a dash, and have instead just put down their names as nested under the
ingredientskey. The names themselves are going to contain information like amounts and notes; I've made sure to nest these under each respective ingredient name. In this way, I've made it possible to make a generator object of ingredient names, immediately after parsing the YAML file for the flavor formula, by just calling thekeys()method on theingredientsdictionary that I index out of the main data structure. More concretely the code involved would simply be:
In [9]: with open("./mustard_green_pork_rice.yaml", "r") as f:
...: fn = yaml.load(f)
...:
In [10]: fn["ingredients"].keys()
Out[10]: dict_keys(['DRIED MUSHROOMS', 'DRIED SHRIMPS', 'MUSTARD GREENS', 'SOYA SAUCE LIGHT', 'PORK BELLY', 'RICE', 'SESAME OIL', 'GINGER', 'GARLIC'])
This greatly simplifies accessing the list of ingredients, an issue which I've raised in an earlier post numbered #5 .
-
Eliminate unncessary sequence generation, focus on dictionaries dictionaries The structure of the original format always resulted in the parsed python object becoming a huge mangle of lists of dictionaries. This was due to the extensive use of dashes. Not a pretty and elegant representation. In making this modification, one guiding principle has been to prioritize dictionaries over lists, to represent as much as possible, the layers as dictionary-keys-dictionary-keys and to only have the list be represented at the end at the bottom-most level of a dictionary-key chain.
-
Added new fields for recipe origin As some research projects might involve studying some categorical or cultural aspects of recipes, I thought that including these new fields related to such information would be useful from the perspective of provenance.
Looking forward to your thoughts on this. Thank you.