Mise en Place for Data Science

Click to learn more about author
Curt Bergmann.

When guests arrive at a great restaurant, the chef and all the cooks have already planned and assembled everything they need to quickly deliver excellence on a plate. Their process, called mise en place, is used by chefs all over the world. Emerging after the introduction of a system of cooking by French chef Georges Escoffier in the early 20th century, it reliably produces a repeatable process that rewards efficiency and excellence. Adapting ideas from Working Clean by Dan Charnas [1] , we too can improve our chances of delivering excellence by planning our steps, assembling our resources, keeping our project folders clean, and documenting our process before finishing for the day — all while checking for errors at every opportunity.

Planning

When we start any Data Science project, we have a plan in mind. How many of us take the time to write it down? Students at the Culinary Institute of America learn this from the very beginning (Charnas, 263). Not only do they write down their steps, but they sequence them in an efficient order. In the fast pace of service, they can then rely on their plan to get them through the day efficiently and with fewer mistakes. It’s time for us to do the same. For example, when we start an exploratory data analysis (EDA), we, too, need a plan to identify the data of interest, understand what each record describes, summarize columns, and other steps. Our first step is to request this access and, while awaiting permissions and access methods, prepare for analysis. You might argue that it’s trivial to remember to make this request, but if you’re asked on a busy Monday to start this work on Wednesday, you might think you have two days to start. By then, you’ll be two days late. It’s better to take time and plan early so that your permissions and access are ready before it’s time for you to analyze the data.

Identifying Resources

When a chef prepares a menu, they can rely on the type of restaurant and their tastes to help them narrow the list. Consider identifying a list of code snippets that can be added into your projects as you need them — just like adding ingredients to a recipe you’re building. Of course, any good chef adjusts the recipe depending on ingredients in season. We can do the same, knowing that some data and projects have different needs.

Working Clean

At the end of the day, the chef puts all of their reusable ingredients in containers, labels them, and puts them away for tomorrow, throwing away unusable leftovers and putting their dirty dishes in the dishwasher (Charnas, 11). We also have unusable leftovers or dirty dishes such as abandoned scripts and temporary output. By deleting them, we won’t be confused tomorrow or next year when we reopen the project folder. We need to label everything we keep by checking in our code to git and pushing to GitHub. Also, as scientists, we need to go further than the cook and document everything we did. If we don’t document it, it didn’t happen. Now, when the client asks questions about our analysis, we have notes to help us answer their questions or show them unreported intermediate results to support why we made the decisions we did during the analysis. But don’t wait until the end of the day. If you work clean as you go and document in near real-time, you will have less to do at the end of the day.

Working clean allows us to concentrate on the good stuff — our R scripts, reports, or PowerPoint slides. Just like a chef tasting their sauces and other ingredients before assembling them onto the plate, we need to check our data at every step of the way, starting with the very first load of the data. Did we check to make sure it loaded all of the records? Are any records duplicated? If so, is that alright? Is there any data missing in the middle of a sequence of date-dependent records? We can fix problems in a less costly manner and in a shorter time by checking at every step of the way.

Assuring Quality

Finally, before we turn over our analysis, we should take one last look at what we are delivering. Just as the chef ensures that the completed meal is delicious and pleasing to the eye, we need to ensure that the insights are explained clearly with text, tables, and figures. Of course, that is only step one in the final check. Peers (other data scientists) and colleagues (other team members) should also check our work to determine quality and applicability to the audience. Just like some guests don’t like spicy foods, some clients won’t understand a histogram. It may take adjustments to make our final results best suited for this client.

Producing Excellence

Once we’ve delivered the final result, it’s time once again to work clean, document, put away our project, and start the next one. We can feel confident that we’ve efficiently performed our analysis because we planned every step of the way, we had all of our resources ready to go when we needed them, and our project is reproducible because we got rid of perishables, checked in all our code, and documented the project. Our client received a deliverable, keeping efficiency and quality in mind every step of the way. Let’s take advantage of the hundred years developing mise en place and put it to use, helping us to deliver excellence.

A Mise en Place Inspired Checklist

If applied to every Data Science project, we can feel confident in delivering results with efficiency and quality included in every step.

1. Does your project have a plan?

2. Does your plan include a list of resources?

3. Is all of your code checked-in?

4. Do you have built-in quality checks?

5. Do you have a project journal?

6. Is your work reproducible?

References

[1] Charnas D. Everything in Its Place, The Power of Mise-En-Place to Organize Your Life, Work, and Mind. Rodale; 2017.

Data Topics