Chapter 3 Data transformation

3.1 Geospatial data

While utilizing the restaurant data from Datafiniti, we were interested in visualizing the spread of vegan/vegetarian restaurants across the US by state. We extracted the PROVINCE column to do so, and used the US state mapper to align state names to the abbreviations within the data. A new dataframe containing only the Province and the number of vegan/vegetarian restaurants was created. These columns were renamed to “region” and “value” to be compatible with the utilized library.

3.2 Interactive component data.

We designed the interactive component to display the counts of individuals that agreed to each level of the motivation variables (1- Strongly Disagree to 5-Strongly Agree). As a result we created a dataset isolating the seven motivations variables as columns, and re-coded the values to indicate 1 if a person was motivated, else 0. The reasons are as follows:

  • Animal Protection
  • Environment
  • Cost
  • Health
  • Religious and Spiritual
  • Social Influence
  • Social Justice and World Hunger
  • Food Trends
  • Feeling of Disgust

We derived 5 columns for each of the above reasons (according to Likert Scale levels 1-5). The counts for each reason and each column are taken into consideration.

3.3 Cleaning of the categorical data.

Most of the data capturing participant opinions/experiences with a vegan/vegetarian diet is recorded as a categorical variable.

All the categorical data that we have is present in the form of opinions. So, we counted the occurence of each opinion based on different categorical variables. For example, we count the number of users who agree with a particular reason that has caused inconvineince. For the inconvenience variable, we decided to reduce the scale from 1-5 to 1-3 as follows:

  • 1 indicating Disagreement to reason of inconvenience
  • 2 indicating Neutral opinion (neither agree nor disagree)
  • 3 indicating Agreement to a particular reason of inconvenience
We generated the counts for each inconvenience reason. We applied a similar approach to Transition and Length of diet variables which represent the amount of time that one required to transition and for how long they have been following their diet. Every column required for the analysis is decoded and renamed for convenience. For example, ALLINCONVENIENCE3D represents the total counts of people that indicated agreement in experience trouble finding vegan/vegetarian options at restaurants . For part of our analyses we were interested in how long people maintained a vegan/vegetarian diet for this particular concern. To extract this, we transformed the data as indicated below:
ALLLENGTH ALLINCONVENIENCE3D ALLINCONVENIENCE4D
0-3 months 124 94
1-2 years 65 51
3-5 years 36 27
4-11 months 77 46
6-9 years 19 15
9 or more years 28 20
No idea 15 27

The cleaning was done in python. The code can be found here