Generating Mock Data with Python! (NumPy, Pandas, & Datetime Libraries)




[ad_1]

In this video we write a python script to automatically generate a sales dataset. To do this we use the NumPy, Pandas, Calendar, & Datetime libraries. This is ultimately the data that we used in my last video “Solving real world data science problems with python pandas”.

Link to the last video:
https://youtu.be/vmEHCJofslg

Link to finished code on GitHub:
https://github.com/KeithGalli/Pandas-Data-Science-Tasks/tree/master/Misc

Useful resources!
NumPy Tutorial: https://youtu.be/GB9ByFAIAH4
Pandas Tutorial: https://youtu.be/vmEHCJofslg
Datetime library documentation: https://docs.python.org/3/library/datetime.html

Detailed video description!
We start by creating a simple dataframe and programmatically adding rows of product purchases to it. We use the random library to select these products.

We make our data more realistic by utilizing normal distributions and geometric distributions in numpy to spread out the number of purchases we make and the quantity of each item purchased.

We use the datetime library to allow us to generate thousands of different times for each purchase with the most common times peaking around 12pm and 8pm.

We take a list of the most common US street addresses to help us randomly generate addresses for each purchases.

Hope you guys enjoy! Make sure to subscribe if you haven’t already 🙂

⭐ Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I’ve been using Kite for 6 months and I love it! https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=keithgalli&utm_content=description-only

———————————————

Follow me on social media!
Instagram | https://www.instagram.com/keithgalli/
Twitter | https://twitter.com/keithgalli

———————————————

Today’s merch!
Creator: @Chris Chann
Website: https://unsatisfied.co/

———————————————

Video Timeline!
0:00 – Intro & Background Info
1:15 – What we’re creating in this video!
2:03 – Start writing code (generating a simple dataframe & csv)
8:26 – Task: Making our data more realistic, selecting some products with higher probability than others
14:15 – Task: Generate 12 months worth of data in 12 csvs (calendar library, f-strings)
18:12 – Make some months have more purchases than others
19:28 – Normal distributions in NumPy
23:43 – Improving speed of our code (making testing easier)
26:41 – Task: Generate random addresses for our data
35:03 – Task: Generate order times for purchases (datetime library overview)
40:02 – Using timedelta objects to add & subtract time from dates
45:09 – Generate a realistic quantity ordered for each product (using numpy geometric distribution)
49:38 – Add multiple items being more likely to be sold together and cleaning code a bit

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.

Source


[ad_2]

Comment List

  • Keith Galli
    December 6, 2020

    Faker is a pretty hefty library for fake data generation, including region localization

  • Keith Galli
    December 6, 2020

    I did month data wise using dictionary and assigning weight. Is it ok???
    Anyone plz let me know

  • Keith Galli
    December 6, 2020

    Enjoyed the video and watching your thought process. Picked up a few tips as I recently found myself generating a large amount of fake customer data. As far as the city/state/zip section I found an old zip code file with corresponding city/state on the Internet >40,000 recs that I converted to SQLite. this made it easy to generate valid C/S/Z data. You may want to check out the Python mimesis (Fake Data Generator) package which, for my use case, drastically reduced my run time from ~30 min to generate one million (16 column) records to 1 min 14sec and that includes accessing the external SQLite zip code database.

  • Keith Galli
    December 6, 2020

    Why is there so few views compared to other videos?

  • Keith Galli
    December 6, 2020

    Still on the journey of going through your entire data science playlist. Eager what I am about to learn in this video.

  • Keith Galli
    December 6, 2020

    [product for product in products ]

    Is the first "product" what the For is "returning"?

  • Keith Galli
    December 6, 2020

    Hi,
    please explain how to use SQL table in Python using Pandas!!!!

  • Keith Galli
    December 6, 2020

    what IDE are you using in the video?

  • Keith Galli
    December 6, 2020

    Best follow-along tutorials on Python Libraries ! Thanks Keith !

  • Keith Galli
    December 6, 2020

    Briliiant, especially when you throw some real statistics into it (random, distributions, etc.). will be very helpful in building simulations for model's testing. thanks!

  • Keith Galli
    December 6, 2020

    I would like to learn a 3d game creation using python or 3d app creation using panda in one video and if it's an large video also I'm k with it

  • Keith Galli
    December 6, 2020

    Great job, Keith! Thanks a lot!

  • Keith Galli
    December 6, 2020

    23:41 Good time to do some profiling. . . snakeviz, or cProfile

  • Keith Galli
    December 6, 2020

    Hi Keith
    I have a doubt regarding extracting. Time value( example like 6.25.37 ) from Excel sheet in python.
    How to get proper output?

  • Keith Galli
    December 6, 2020

    Keith – Great video really helpful!

  • Keith Galli
    December 6, 2020

    Hey Kieth, can you make a dedicated vedio on Time Series Analysis please !

  • Keith Galli
    December 6, 2020

    hey man, appreciate your videos. However, could you do a video how to scrap data from web pages? I had to scrap data from Russian web page and I could not. thanks again

  • Keith Galli
    December 6, 2020

    send me background music
    =

  • Keith Galli
    December 6, 2020

    Hi Keith, great video, I'm learning to be a Python developer, so for practice, I've taken your base code and refactored it. Is there any way I could send you the code and get your honest feedback/critique on it? I'm finding it very difficult to get a job and would appreciate any feedback. Thanks.

  • Keith Galli
    December 6, 2020

    You rock. Thank you.

  • Keith Galli
    December 6, 2020

    Awesome!!

  • Keith Galli
    December 6, 2020

    Thank you, friend.

  • Keith Galli
    December 6, 2020

    Love you baby…

  • Keith Galli
    December 6, 2020

    Ohh my god… you are cool…!! awesome…

Write a comment