![[Formales]]
![[coffee_shop_data_set__alvcoffeestock_007.png]]
[Cofee Shop Business Model In A Nutshell](https://fourweekmba.com/how-to-increase-sales-for-your-local-business-coffee-shop/)<br> Author: *Gennaro Cuofano*
> [!TLDR] Coffee Shop Dataset
> - The dataset for the coffee shop inhabits 500 record sets and around 20 variables, reflecting a variety of realistic attributes one might observe in a coffee shop's transactional data.
> - This dataset can be used for various analyses, such as customer behavior analysis, sales trends, the impact of weather on sales, and more, providing a rich context for understanding business operations in a coffee shop setting.
# Variables
![[coffee_shop_dataset_variables__alvkaffee_064.png]]
> [!multi-column]
>
>> [!NOTE] transaction_id
>> A unique identifier for each transaction.
>
>> [!NOTE] customer_name
>> The name of the customer making the purchase.
> [!multi-column]
>
>> [!NOTE] customer_email
>> The email address of the customer.
>
>> [!NOTE] purchase_date
>> The date when the purchase was made.
> [!multi-column]
>
>> [!NOTE] purchase_time
>> The time when the purchase was made.
>
>> [!NOTE] coffee_type
>> The type of coffee purchased (e.g., espresso, latte).
> [!multi-column]
>
>> [!NOTE] coffee_size
>> The size of the coffee purchased (small, medium, large).
>
>> [!NOTE] coffee_price
>> The price of the coffee, which is adjusted based on the size of the coffee.
> [!multi-column]
>
>> [!NOTE] quantity
>> The number of coffee items purchased in the transaction.
>
>> [!NOTE] barista_name
>> The name of the barista who prepared the coffee.
> [!multi-column]
>
>> [!NOTE] payment_type
>> The method of payment used (e.g., cash, credit card, mobile payment).
>
>> [!NOTE] rating
>> The customer's rating of their experience, from 1 to 5.
> [!multi-column]
>
>> [!NOTE] weather_condition
>> The weather condition at the time of purchase (e.g., sunny, rainy).
>
>> [!NOTE] temperature
>> The temperature at the time of purchase.
> [!multi-column]
>
>> [!NOTE] weekday
>> The day of the week when the purchase was made.
>
>> [!NOTE] is_member
>> Indicates whether the customer is a member of the coffee shop's loyalty program.
> [!multi-column]
>
>> [!NOTE] discount_applied
>> The percentage of discount applied to the purchase, if any.
>
>> [!NOTE] tips
>> The amount of tips given, which is higher on weekends.
> [!multi-column]
>
>> [!NOTE] feedback
>> The customer's feedback about their experience in a sentence.
>
>> [!NOTE] seating_area
>> The preferred seating area of the customer at the time of purchase (indoor, outdoor).
> [!multi-column]
>
>> [!NOTE] wifi_usage
>> The level of WiFi usage by the customer during their visit (none, light, moderate, heavy).
>
>> [!BLANK]
>>
# Python
![[coffee_shop_dataset_python__alvkaffee_059.png]]
```python
from faker import Faker
import random
import pandas as pd
import numpy as np
# Initialize Faker
fake = Faker()
# Define function to create a dataset for the coffee shop
def create_coffee_shop_dataset(num_records):
# Data variables and their distributions
data = {
"transaction_id": [fake.unique.uuid4() for _ in range(num_records)],
"customer_name": [fake.name() for _ in range(num_records)],
"customer_email": [fake.email() for _ in range(num_records)],
"purchase_date": [fake.date_between(start_date='-2y', end_date='today') for _ in range(num_records)],
"purchase_time": [fake.time() for _ in range(num_records)],
"coffee_type": [random.choice(['espresso', 'latte', 'cappuccino', 'americano', 'mocha']) for _ in range(num_records)],
"coffee_size": [random.choice(['small', 'medium', 'large']) for _ in range(num_records)],
"coffee_price": [round(random.uniform(2.5, 5.0), 2) for _ in range(num_records)],
"quantity": [random.randint(1, 5) for _ in range(num_records)],
"barista_name": [fake.first_name() for _ in range(num_records)],
"payment_type": [random.choice(['cash', 'credit card', 'mobile payment']) for _ in range(num_records)],
"rating": [random.choice(range(1, 6)) for _ in range(num_records)],
"weather_condition": [random.choice(['sunny', 'cloudy', 'rainy', 'snowy']) for _ in range(num_records)],
"temperature": [round(random.uniform(20, 30), 1) for _ in range(num_records)],
"weekday": [fake.day_of_week() for _ in range(num_records)],
"is_member": [random.choice([True, False]) for _ in range(num_records)],
"discount_applied": [round(random.uniform(0, 0.25), 2) if random.random() > 0.8 else 0 for _ in range(num_records)],
"tips": [round(random.uniform(0, 5), 2) if random.random() > 0.5 else 0 for _ in range(num_records)],
"feedback": [fake.sentence() for _ in range(num_records)],
"seating_area": [random.choice(['indoor', 'outdoor']) for _ in range(num_records)],
"wifi_usage": [random.choice(['none', 'light', 'moderate', 'heavy']) for _ in range(num_records)],
}
# Create DataFrame
df = pd.DataFrame(data)
# Additional variable transformations for more complex distributions
# Example: Adjust coffee price based on size
size_price_adjustment = {'small': -0.5, 'medium': 0.0, 'large': 0.5}
df['coffee_price'] = df.apply(lambda row: row['coffee_price'] + size_price_adjustment[row['coffee_size']], axis=1)
# Example: Higher tips on weekends
df['tips'] = df.apply(lambda row: row['tips'] * 1.5 if row['weekday'] in ['Saturday', 'Sunday'] else row['tips'], axis=1)
return df
# Generate the dataset
coffee_shop_dataset = create_coffee_shop_dataset(500)
coffee_shop_dataset.head()
```