Python Pandas Tutorial (Part 10): Working with Dates and Time Series Data
[ad_1]
In this video, we will be learning how to work with DateTime and Time Series data in Pandas.
This video is sponsored by Brilliant. Go to https://brilliant.org/cms to sign up for free. Be one of the first 200 people to sign up with this link and get 20% off your premium subscription.
In this Python Programming video, we will be learning several different concepts about working with DateTimes and Time Series data in Pandas. We will learn how to convert values to datetimes, how to filter by dates, how to resample our dates to do some more in-depth analysis, and more. Let’s get started…
The code for this video can be found at:
http://bit.ly/Pandas-10
StackOverflow Survey Download Page – http://bit.ly/SO-Survey-Download
Datetime Formatting Codes – http://bit.ly/python-dt-fmt
Pandas Date Offset Codes – http://bit.ly/pandas-dt-fmt
✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms
✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join
✅ One-Time Contribution Through PayPal:
https://goo.gl/649HFY
✅ Cryptocurrency Donations:
Bitcoin Wallet – 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3
Ethereum Wallet – 0x151649418616068fB46C3598083817101d3bCD33
Litecoin Wallet – MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot
✅ Corey’s Public Amazon Wishlist
http://a.co/inIyro1
✅ Equipment I Use and Books I Recommend:
https://www.amazon.com/shop/coreyschafer
▶️ You Can Find Me On:
My Website – http://coreyms.com/
My Second Channel – https://www.youtube.com/c/coreymschafer
Facebook – https://www.facebook.com/CoreyMSchafer
Twitter – https://twitter.com/CoreyMSchafer
Instagram – https://www.instagram.com/coreymschafer/
#Python #Pandas
Source
[ad_2]
Hope you all are all staying safe! In this video we'll be learning a lot about working with dates and time-series data in Pandas, and also look at doing some basic plotting. In the next video, we'll be learning how to load in data to Pandas from different (Excel, JSON, SQL, etc). Let me know if there is anything else you'd like me to cover in the Pandas series. I will likely be taking a break from this series after the next two videos are released just so I can focus on some different topics.
Corey do you have the document that you have all of those commends that you used in this series?
For all those getting the following error while loading csv file: FutureWarning: The pandas.datetime class is deprecated and will be removed from pandas in a future version.
We can use to_datetime instead.
def d_parser(s):
return pd.to_datetime(s, format='%Y-%m-%d %I-%p')
df2 = pd.read_csv(r'C:Users…eth_1h.csv', parse_dates=['Date'],date_parser=d_parser)
You can also handle errors e.g. pd.to_datetime(…, error='coerce').[This is optional]
Hi, what if I want to take only the time('%H:%M:%S) as index………I just can´t figure it out…..HELP ME PLEASE!!
Hi, New to Pandas and just a bit confused:
Firstly at 6:36 , we converted all the values of the Series to datetime type from String type using:
df[‘Date’] = pd.to_datetime(df[‘Date’], format = ‘%Y-%m-%d %I-%p’)
but when doing the same thing while loading csv 10:24,
we used a lengthier approach of using a lambda and assigning it to date_parser argument.
So my doubt is whether while following the later approach, do we basically apply a function(lambda in this case) to a column('Date') so that the function is applied to each value in that column and it converts each value(String) by calling strptime() ?
If so, can we assume to_datetime() method converts a whole Series to datetime type while the later approach converts each value of a Series to datetime object?
Is there any way we can replicate the former way while loading the csv file? That look minimal and easy.
Thanks in advence
Good job thanks
@CoreySchafer, you're awesome. Loved your quality video content. Your work is fabulous.
I'd really like to see DevOps related content if it comes under your scope. Thanks 👍
Please, how do I get the ETH dataset used in this video? I did not see it in the description. Thank you
How are you able to access indexes by only typing part of the index name, when you used df['2019']?
Great video! One question: when you filter by year
filt = (df['Date'] >= '2019')
how does Python know that you mean the year part of the date variable?
Raise your hands if you also got stuck due to an error not known to you only to find out that you used the wrong brackets for a function or assignment.
I got stuck atleast 10 times. LOL.
Is this Theo Von talking?
Values the time of viewers. Not sure if you really speak so fast or just increase speed in the first and last few mins of video.
I just like the way it goes. Thanks for the great tutorial.
Always the best, wonderful.
How to add number of days to the date in Pandas?
What to do if I want to count number of time high crosses some specific value for every day
Corey this series is fantastic, thank you!
How can i strip date from time and display only date in column.. I don't want to display time?.. Everytime while doing this, i am getting error.. !
I can't find the eth_1h file
Thanks a lot Corey, but I have a problem in saving properly the Csv file. I have copied them from your Github and saved them in Excel sheet with csv extension. when i load them in Jupyter they came with coma and as a single column. any help?
Hi all, In minute 20:30 when filtering df['2020-01':'2020-02] I get an empty dataframe as result. However, if I change the order of the filter df['2020-02':'2020-01] it works. Is there something wrong with this? I didn't sort in any way different than Corey's video but it seems that I need to filter following the pattern of the index to get the correct result. Any comments? Thanks! (Ricardo from Argentina)
This Dude is IMBA.
pandas plotting will be useful and intesting! looking forward to it! Thanks Corey!
please make a tut about time decay, for example, stock information comes every late 1 month
Are you also the Talking Thrones guy? Asking for a friend…
your videos are awesome i learn a lot but here is a problem
how to substract one week month or year from a given date…for example if we want to know the mean od last on month or week or year how we can do it..
At 20.08, calling df['2019'] didn't work for me. Gave an 'Assertion Error'. df.loc['2019'] worked though.
Hi Corey, I am not able to do the slicing operation under my jupiter Notebook.. means it is returning blank result but while checking the dates are avaliable inside the data frame.. using the same csv file
time_date_df['2020-01' : '2020-02'] ### time_date_df – Data Frame Name… pandas version – 1.0.1..
Awesome teacher you are Corey. Wish I had a teacher like you years ago … some teach a new skill with such grace but Legends take it a level higher when they show you what you could achieve with the right teaching…take a bow master…luv and respect from India. thank you for inspiring me
import pandas as pd
url = 'https://raw.githubusercontent.com/CoreyMSchafer/code_snippets/master/Python/Pandas/10-Datetime-Timeseries/ETH_1h.csv'
df = pd.read_csv(url)
For those guys who are unable to download and use the csv file, just use the above code to start off !
df["Date"].dt.day_name() works too, but the lambda function looks cool. Is there any reason why you use lambda than dt?
…. thank you Corey. Amazing videos . I am learning a lot.
where is the csv file
Hello Corey! Excellent videos and teaching styles! One comment. I am not sure I get 100% what we are doing with:
df['Date'].dt.day_name()
Why do we need .dt. I am trying to relate that concept to classes, OOP, and maybe inheritance but don't get it.
Thanks
The concept of resample is so damn crazy!!! I struggle so much in other systems with this kind of aggregation and it's easy like nuts in pandas…
Corey, you are great!
Hey Corey,
I started watching your Pandas tutorial series and I still don't really see which methods in the API I'm supposed to use to achieve my desired results. I have a data frame like so:
Date Change
2010-08-25 0.08
2010-08-26 -0.22
2010-08-27 0.04
2010-08-30 -0.08
2010-08-31 -0.11
… …
2020-08-18 0.96
2020-08-19 -1.79
2020-08-20 5.04
2020-08-21 -0.84
2020-08-24 -1.10
The Date column is an index of course. What I want to do is basically partition this data by year. Once partitioned by year, I want to group consecutive rows by the sign of the change column such that consecutive negatives and consecutive positives are grouped together. Once that is done, I want to get the overlap of date ranges with matching sign for all years. For example, if change is positive from 2010-08-25 to 2010-08-27 and from 2011-08-26 to 2011-08-29 the common overlap would be 08-26 to 08-27, obviously accounting for all years not just 2. At that point, once I have the common date ranges and their values I want to average all of the numbers in that range such that at the end I have ranges of dates for which change is consistently positive or negative and the average change for each range. How can I achieve this?
im sad only 1 tutorial i left 🙁
Thanks Corey for this tutorial
I really like how you explain things so clearly. Super helpful for my project. Thank you so much!
Can you do a complete EDA on a dataset from Kaggle? Thanks for this video.
Hi Corey, thank you so much for videos, you are doing an amazing job.
32:35 I got this error 'DataFrame' object has no attribute 'resamlpe'
I'm happy that I found a Awesome channel to learn Python for Data Science Easily!!!! Very Nice and Crystal Clear Explanation!!👌🙂
19:35 –> df['2019]
AssertionError: <class 'numpy.ndarray'>
Solution: https://github.com/pandas-dev/pandas/issues/35509
"Before 1.1.0, you didn't have to sort the index before slicing. Now, you have to run sort_index()"
Correct code:
df.sort_index(inplace=True)
df['2019']