Chapter 3 Data transformation
3.1 Geospatial data
While utilizing the restaurant data from Datafiniti, we were interested in visualizing the spread of vegan/vegetarian restaurants across the US by state. We extracted the PROVINCE column to do so, and used the US state mapper to align state names to the abbreviations within the data. A new dataframe containing only the Province and the number of vegan/vegetarian restaurants was created. These columns were renamed to “region” and “value” to be compatible with the utilized library.
3.2 Interactive component data.
We designed the interactive component to display the counts of individuals that agreed to each level of the motivation variables (1- Strongly Disagree to 5-Strongly Agree). As a result we created a dataset isolating the seven motivations variables as columns, and re-coded the values to indicate 1 if a person was motivated, else 0. The reasons are as follows:
- Animal Protection
- Environment
- Cost
- Health
- Religious and Spiritual
- Social Influence
- Social Justice and World Hunger
- Food Trends
- Feeling of Disgust
We derived 5 columns for each of the above reasons (according to Likert Scale levels 1-5). The counts for each reason and each column are taken into consideration.
3.3 Cleaning of the categorical data.
Most of the data capturing participant opinions/experiences with a vegan/vegetarian diet is recorded as a categorical variable.
All the categorical data that we have is present in the form of opinions. So, we counted the occurence of each opinion based on different categorical variables. For example, we count the number of users who agree with a particular reason that has caused inconvineince. For the inconvenience variable, we decided to reduce the scale from 1-5 to 1-3 as follows:
- 1 indicating Disagreement to reason of inconvenience
- 2 indicating Neutral opinion (neither agree nor disagree)
- 3 indicating Agreement to a particular reason of inconvenience
ALLLENGTH | ALLINCONVENIENCE3D | ALLINCONVENIENCE4D |
---|---|---|
0-3 months | 124 | 94 |
1-2 years | 65 | 51 |
3-5 years | 36 | 27 |
4-11 months | 77 | 46 |
6-9 years | 19 | 15 |
9 or more years | 28 | 20 |
No idea | 15 | 27 |
The cleaning was done in python. The code can be found here