Coffee Shop Dataset - dsci3d - Obsidian Publish

![[Formales]] ![[coffee_shop_data_set__alvcoffeestock_007.png]] [Cofee Shop Business Model In A Nutshell](https://fourweekmba.com/how-to-increase-sales-for-your-local-business-coffee-shop/)<br> Author: *Gennaro Cuofano* > [!TLDR] Coffee Shop Dataset > - The dataset for the coffee shop inhabits 500 record sets and around 20 variables, reflecting a variety of realistic attributes one might observe in a coffee shop's transactional data. > - This dataset can be used for various analyses, such as customer behavior analysis, sales trends, the impact of weather on sales, and more, providing a rich context for understanding business operations in a coffee shop setting. # Variables ![[coffee_shop_dataset_variables__alvkaffee_064.png]] > [!multi-column] > >> [!NOTE] transaction_id >> A unique identifier for each transaction. > >> [!NOTE] customer_name >> The name of the customer making the purchase. > [!multi-column] > >> [!NOTE] customer_email >> The email address of the customer. > >> [!NOTE] purchase_date >> The date when the purchase was made. > [!multi-column] > >> [!NOTE] purchase_time >> The time when the purchase was made. > >> [!NOTE] coffee_type >> The type of coffee purchased (e.g., espresso, latte). > [!multi-column] > >> [!NOTE] coffee_size >> The size of the coffee purchased (small, medium, large). > >> [!NOTE] coffee_price >> The price of the coffee, which is adjusted based on the size of the coffee. > [!multi-column] > >> [!NOTE] quantity >> The number of coffee items purchased in the transaction. > >> [!NOTE] barista_name >> The name of the barista who prepared the coffee. > [!multi-column] > >> [!NOTE] payment_type >> The method of payment used (e.g., cash, credit card, mobile payment). > >> [!NOTE] rating >> The customer's rating of their experience, from 1 to 5. > [!multi-column] > >> [!NOTE] weather_condition >> The weather condition at the time of purchase (e.g., sunny, rainy). > >> [!NOTE] temperature >> The temperature at the time of purchase. > [!multi-column] > >> [!NOTE] weekday >> The day of the week when the purchase was made. > >> [!NOTE] is_member >> Indicates whether the customer is a member of the coffee shop's loyalty program. > [!multi-column] > >> [!NOTE] discount_applied >> The percentage of discount applied to the purchase, if any. > >> [!NOTE] tips >> The amount of tips given, which is higher on weekends. > [!multi-column] > >> [!NOTE] feedback >> The customer's feedback about their experience in a sentence. > >> [!NOTE] seating_area >> The preferred seating area of the customer at the time of purchase (indoor, outdoor). > [!multi-column] > >> [!NOTE] wifi_usage >> The level of WiFi usage by the customer during their visit (none, light, moderate, heavy). > >> [!BLANK] >> # Python ![[coffee_shop_dataset_python__alvkaffee_059.png]] ```python from faker import Faker import random import pandas as pd import numpy as np # Initialize Faker fake = Faker() # Define function to create a dataset for the coffee shop def create_coffee_shop_dataset(num_records): # Data variables and their distributions data = { "transaction_id": [fake.unique.uuid4() for _ in range(num_records)], "customer_name": [fake.name() for _ in range(num_records)], "customer_email": [fake.email() for _ in range(num_records)], "purchase_date": [fake.date_between(start_date='-2y', end_date='today') for _ in range(num_records)], "purchase_time": [fake.time() for _ in range(num_records)], "coffee_type": [random.choice(['espresso', 'latte', 'cappuccino', 'americano', 'mocha']) for _ in range(num_records)], "coffee_size": [random.choice(['small', 'medium', 'large']) for _ in range(num_records)], "coffee_price": [round(random.uniform(2.5, 5.0), 2) for _ in range(num_records)], "quantity": [random.randint(1, 5) for _ in range(num_records)], "barista_name": [fake.first_name() for _ in range(num_records)], "payment_type": [random.choice(['cash', 'credit card', 'mobile payment']) for _ in range(num_records)], "rating": [random.choice(range(1, 6)) for _ in range(num_records)], "weather_condition": [random.choice(['sunny', 'cloudy', 'rainy', 'snowy']) for _ in range(num_records)], "temperature": [round(random.uniform(20, 30), 1) for _ in range(num_records)], "weekday": [fake.day_of_week() for _ in range(num_records)], "is_member": [random.choice([True, False]) for _ in range(num_records)], "discount_applied": [round(random.uniform(0, 0.25), 2) if random.random() > 0.8 else 0 for _ in range(num_records)], "tips": [round(random.uniform(0, 5), 2) if random.random() > 0.5 else 0 for _ in range(num_records)], "feedback": [fake.sentence() for _ in range(num_records)], "seating_area": [random.choice(['indoor', 'outdoor']) for _ in range(num_records)], "wifi_usage": [random.choice(['none', 'light', 'moderate', 'heavy']) for _ in range(num_records)], } # Create DataFrame df = pd.DataFrame(data) # Additional variable transformations for more complex distributions # Example: Adjust coffee price based on size size_price_adjustment = {'small': -0.5, 'medium': 0.0, 'large': 0.5} df['coffee_price'] = df.apply(lambda row: row['coffee_price'] + size_price_adjustment[row['coffee_size']], axis=1) # Example: Higher tips on weekends df['tips'] = df.apply(lambda row: row['tips'] * 1.5 if row['weekday'] in ['Saturday', 'Sunday'] else row['tips'], axis=1) return df # Generate the dataset coffee_shop_dataset = create_coffee_shop_dataset(500) coffee_shop_dataset.head() ```