Skip to content

Suggestion for some improvements to the format with an example #6

@kohaugustine

Description

@kohaugustine

Hi @techhat , following on our discussion in an earlier post #5 , I've given some more thought to the format and have experimented with different structures. I think I've arrived at one which might be worth considering. Here I thought I would just share some ideas using a concrete example of a recipe taken from this website.

An example YAML file that encodes this recipe, with some modifications from the original ORF:

# name of the dish
dish_name: MUSTARD GREEN PORK RICE

# Name of the dish's originator 
creator: Cynthia Lim

# How many persons this recipe can feed
serves:
  min: 6 
  max: 8 # If there is no range in the number of persons, just fill in the max field

# Information about the recipe, such as its uses, occasions where it is served, and any historical notes
info: 
  - It is my all-time favourite comfort food

# The most specific date the recipe was created, if known, else leave blank
creation_date:

# Category of the dish's cuisine, and associated cultural and geographical information
category:
  race: CHINESE
  ethnic_group: 
  geography: SINGAPORE

# Information source where this recipe was obtained from, be it a website or a cookbook
source_info: 
  type: website
  # Either website or book name is provided, one is filled, the other is left empty
  url: http://mysingaporefood.com/recipe/mustard-green-pork-rice/
  book_name:

# Ingredients used in the recipe
ingredients:
  
  RICE:     
    amount: 2
    units:  CUP
    notes: 
      - Uncooked raw rice

  MUSTARD GREENS:    
    amount: 2
    unit: BUNCH
    notes:
      - Washed and cut into bite-size

  GARLIC: 
    amount: 8
    unit: CLOVE
    notes:
      - Mince the garlic

  GINGER: 
    amount: 0.25
    unit: PIECE
    notes:
      - Sliced thinly
  
  DRIED SHRIMPS: 
    amount: 30
    unit: GRAM
    notes:
      - Soaked in hot water, then drained

  PORK BELLY: 
    amount: 1
    unit: SLAB
    notes:
      - Marinated with 2 tablespoon of soya sauce and 1 teaspoon sesame oil
      - Sliced into bite-size

  DRIED MUSHROOMS:
    amount: 30
    unit: GRAM
    notes:
      - Soaked in hot water to soften, then drained
  
  SOYA SAUCE LIGHT:
    amount: 3
    unit: TABLESPOON
    notes:
      - 2 tbsp used for seasoning pork belly, 1 tbsp used to drizzle on top of cooked rice
    
  SESAME OIL:
    amount: 2
    unit: TEASPOON
    notes: 
      - 1 tsp used for sauteing, 1 tsp used to drizzle on top of cooked rice

# Steps to cook the dish
steps:
  1: Wash the rice twice and drain well.
  2: Heat up wok. Add sesame oil.
  3: Saute ginger till fragrant.
  4: Add garlic and saute.  
  5: Add dried shrimps and saute.
  6: Add mushrooms and saute till fragrant.
  7: Add pork belly and stir-fry till pork belly is half cooked.
  8: Add in rice.
  9: Add in soya sauce and stir well.
  10: Add in mustard greens. Stir well.
  11: Scoop mixture into rice cooker and cover with sufficient water.
  12: Cook as per instructions on rice cooker.
  13: Ready and serve.

Modifications and their rationale:

  • Make the ingredients in the recipe as accessible at the top level as possible. To this goal, I've stopped making the ingredient names be marked with a dash, and have instead just put down their names as nested under the ingredients key. The names themselves are going to contain information like amounts and notes; I've made sure to nest these under each respective ingredient name. In this way, I've made it possible to make a generator object of ingredient names, immediately after parsing the YAML file for the flavor formula, by just calling the keys() method on the ingredients dictionary that I index out of the main data structure. More concretely the code involved would simply be:
In [9]: with open("./mustard_green_pork_rice.yaml", "r") as f:
   ...:     fn = yaml.load(f)
   ...:     

In [10]: fn["ingredients"].keys()
Out[10]: dict_keys(['DRIED MUSHROOMS', 'DRIED SHRIMPS', 'MUSTARD GREENS', 'SOYA SAUCE LIGHT', 'PORK BELLY', 'RICE', 'SESAME OIL', 'GINGER', 'GARLIC'])

This greatly simplifies accessing the list of ingredients, an issue which I've raised in an earlier post numbered #5 .

  • Eliminate unncessary sequence generation, focus on dictionaries dictionaries The structure of the original format always resulted in the parsed python object becoming a huge mangle of lists of dictionaries. This was due to the extensive use of dashes. Not a pretty and elegant representation. In making this modification, one guiding principle has been to prioritize dictionaries over lists, to represent as much as possible, the layers as dictionary-keys-dictionary-keys and to only have the list be represented at the end at the bottom-most level of a dictionary-key chain.

  • Added new fields for recipe origin As some research projects might involve studying some categorical or cultural aspects of recipes, I thought that including these new fields related to such information would be useful from the perspective of provenance.

Looking forward to your thoughts on this. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions