Separate Excel Data into Workbooks by Column Values – Python Pandas Tutorial




[ad_1]

Hey Everyone! In this one we’ll talk a look at how we can break down a workbook into multiple other workbooks depending ont he value inside a column.

We do it here for all unique values in one of our columns, but you could do the conditional indexing any way you like. I hope this shows you the baseline of how to do it!

Support the Channel on Patreon —
https://www.patreon.com/join/derricksherrill
Join The Socials —
Reddit – https://www.reddit.com/r/CodeWithDerrick/
FB – https://www.facebook.com/CodeWithDerrick/
Insta – https://www.instagram.com/codewithderrick/
Twitter – https://twitter.com/codewithderrick
LinkedIn – https://www.linkedin.com/in/derricksherrill/
GitHub – https://github.com/Derrick-Sherrill
*****************************************************************
Workbook –
https://drive.google.com/file/d/12p1q_UlQP7E8O2ECAZIp07ZZfaC12Ht5/view?usp=sharing

Full code from the video:
import pandas as pd

excel_file_path = ‘training_status.xlsx’

df = pd.read_excel(excel_file_path)
# print(df)

split_values = df[‘Shift’].unique()
# print(split_values)

for value in split_values:
df1 = df[df[‘Shift’] == value]
output_file_name = “Shift_” + str(value) + “_Trainings.xlsx”
df1.to_excel(output_file_name, index=False)

https://github.com/Derrick-Sherrill/DerrickSherrill.com/blob/master/separate_by_value.py

Packages (& Versions) used in this video:
Pandas 0.25.0
Python 3.8

*****************************************************************
Code from this tutorial and all my others can be found on my GitHub:
https://github.com/Derrick-Sherrill/DerrickSherrill.com

Check out my website:
https://www.derricksherrill.com/

If you liked the video – please hit the like button. It means more than you know. Thanks for watching and thank you for all your support!!

— Channel FAQ —

What text editor do you use?
Atom – https://atom.io/

What Equipment do you use to film videos?
https://www.amazon.com/shop/derricksherrill

What editing software do you use?
Adobe CC – https://www.adobe.com/creativecloud.html
Premiere Pro for video editing
Photoshop for images
After Effects for animations

Do I have any courses available?
Yes & always working on more!
https://www.udemy.com/user/derrick-sherrill-2/

Where do I get my music?
I get all my music from the copyright free Youtube audio library
https://www.youtube.com/audiolibrary/music?nv=1

Let me know if there’s anything else you want answered!

————————-

Always looking for suggestions on what video to make next — leave me a comment with your project! Happy Coding!

Source


[ad_2]

Comment List

  • Derrick Sherrill
    December 22, 2020

    Thank you Derrick.

  • Derrick Sherrill
    December 22, 2020

    This is exactly what I needed; thank you

  • Derrick Sherrill
    December 22, 2020

    Thank you for this video..python code looks more less and easy than VBA ..

  • Derrick Sherrill
    December 22, 2020

    Hi
    How we can match data of multiple columns( as per column headers) with a base file ? I Meant, instead of making so many Vlookups…

  • Derrick Sherrill
    December 22, 2020

    Hi, can we output the data to the same file but to separate sheets?

  • Derrick Sherrill
    December 22, 2020

    Hey Derrick,

    Great short and well explained video with real working life issues! Thanks!

  • Derrick Sherrill
    December 22, 2020

    Hello Derrick. Thanks for the video. Can you explain (in the example you presented) how in above to create different sheets in same excel file instead of creating 3 different excel files. Thank you.

  • Derrick Sherrill
    December 22, 2020

    Please make a video of excel file in python for calculation of stock inventory with profit & loss i

  • Derrick Sherrill
    December 22, 2020

    Hey Derrick, Thanks for this awesome concept. But i have one doubt, can you please confirm me where we have to locate this file i mean in that same folder where python is installed or python will automatically detect this file from system?

    Thank You 🙂

  • Derrick Sherrill
    December 22, 2020

    Hi Derrick! Your videos are excellent. I have subscribed to your channel. I've watched a few and searched in the web but I am still not sure about the following problem. Comparing two excel files that have different rows and columns and the number of columns and rows are different too. I've tried merge but I have not been able to fully understand the output. One file has 1,013 rows, the other 411. But when I merged the two of them using outer (as per one of your videos using a common column), the output gives me 1,035. I know that python creates dup rows under certain conditions but I have not found why and, more importantly, how to find them.

    In essence, I want to compare the two files and clearly breakdown all the rows that are equal, and clearly identify the ones that are different for reconciliation…..There are a few videos that deal with this by even adding the two different values in the same cells which will solve my problem. However, they are constrained by having to have the exact same number of columns and rows as well as column names. Your help is appreciated!

  • Derrick Sherrill
    December 22, 2020

    I have been working on a text file which i open in excel and it gives 5 values in a single cell and for entire colum i am getting such results how to sivide that sigle column in 5 columns?

  • Derrick Sherrill
    December 22, 2020

    thank you! helped me a lot.
    Question: what if I need to filter in two columns?

  • Derrick Sherrill
    December 22, 2020

    Hi Derrick, thanks for your videos!
    How to use pandas and probably numpy min function to compare 3 excel columns with different values and find out the minimum value in each row? I use these code as a starting point:

    filename = 'Supplier_prices_comparison.xlsx'

    df1 = pd.read_excel(filename)

    a = df1.iloc[1:-1, 7]

    b = df1.iloc[1:-1, 10]

    c = df1.iloc[1:-1, 13]

    # I don't know what to do next. If you can please help me!

  • Derrick Sherrill
    December 22, 2020

    Thanks Derrick. Very clear.

  • Derrick Sherrill
    December 22, 2020

    hello,
    I get FileCreateError when I run the code.
    Can you help me?
    Thankyou

  • Derrick Sherrill
    December 22, 2020

    Hi Derrick
    Very usefull video 😊

    I have a excel workbook with 16 sheets
    Each sheets has a fix name
    The sheets name has prefix of H or L & Customer Account
    Eg.
    Sheet1 = H10000
    Sheet2 = L10000
    Sheet3= H10001
    Sheet4= L10001
    Sheet5= L10002
    Sheet6= L10002
    Etc.
    I want to split the 16 sheets in to 8 new Excel workbooks

    Wb1 : Sheet1 = sheet name H10000
    Wb1 : Sheet1 = sheet name L10000
    Save as filename from list in original workbook

    Wb2 : Sheet1 = sheet name H10002
    Wb2 : Sheet2 = sheet name L10002
    Save as = filename from a list in the original workbook

    Wb3: Sheet1 = sheet name H10003
    Wb3: Sheet2 = sheet name L10003
    Save as = filename from a list in the original workbook
    Etc.

  • Derrick Sherrill
    December 22, 2020

    this is awesome, thank you

  • Derrick Sherrill
    December 22, 2020

    Hi, Thanks for Video, However I need to supply another column with fix value in filter, How can I do that?

  • Derrick Sherrill
    December 22, 2020

    Man you are great please keep it up – I'm learning so much from you

  • Derrick Sherrill
    December 22, 2020

    Hi ,Derrick can you share me how to remove connection busy issue in pyodbc

  • Derrick Sherrill
    December 22, 2020

    Randomly clicked on this video and this was the EXACT problem I needed to solve at work. You're the best!!

  • Derrick Sherrill
    December 22, 2020

    Just another helpful video as usual…Thx 👍👍👏👏👏

  • Derrick Sherrill
    December 22, 2020

    Hi Derrick, Many thanks for sharing this, really very helpful. Can you also do something on how to work on time formats? How to convert time format, fix missing format etc.? Thanks again,

  • Derrick Sherrill
    December 22, 2020

    Hi Derrick,

    Thanks for an amazing guide.

    I tried to extend the concept to make a more generic solution whereby I have some inputs for each file i.e. the columns i wanna split on, Root directory where I want the results etc. However, while the code works fine, the data is not getting split in the workbooks, I just have headers in the splitted workbooks.

    Here's the code for your reference :

    root_folder = 'C:/Users/XYZ'/Downloads

    excel_file_path = 'C:/Users/XYZ/Downloads/training_status.xlsx'

    variable = 'Shift'

    df = pd.read_excel(excel_file_path)

    # print(df)

    split_value = df[variable].unique()

    #print(split_value)

    for value in split_values:

    df1 = df[df[str(variable)] == value]

    output_file_name = str(root_folder) + "Data_" + str(value) + ".xlsx"

    df1.to_excel(output_file_name, index=False)

    print ("Done")

  • Derrick Sherrill
    December 22, 2020

    I am a very beginner in Python. never used Pandas, I still got that 100%

  • Derrick Sherrill
    December 22, 2020

    you are great

  • Derrick Sherrill
    December 22, 2020

    Hello Derrick. i really like the way you communicate and explain the python language. However, I ve seen that you are using only pandas for reading excel workbooks. I would like to ask you why you are not using xlrd at all? I am building a database with data exported from multiple workbooks and it is my understanding that xlrd offers the capability to insert data to a database hosted in MySql. Furthermore, based on your deep know-how in data sets, which functionality of python would you use to export DATES data from excel to a MySQl database? Thanks for sharing your knowledge!

  • Derrick Sherrill
    December 22, 2020

    Thank You for this wonderful and simple video

  • Derrick Sherrill
    December 22, 2020

    Hey Derrick, thanks a lot for the video. It was very usefull for me.

    In my case, the process took too long, with 162k records and 78 unique spreadsheets as output. But, it is another thing.
    Thanks again.

  • Derrick Sherrill
    December 22, 2020

    Hey Derrick, Great work buddy. I have scenario for you : let's say you have multiple excel files with same type of data but the column names are not standardized, sequenced properly. How can we use pandas to standardize them and load them into a different excel file stacked vertically

  • Derrick Sherrill
    December 22, 2020

    base on the code, is there any possibility to specify the output directory, the name is generated base on cell and so I can not put the full directory on it, can you pls teach me how. THx.

  • Derrick Sherrill
    December 22, 2020

    Just use power query for fuck's sake.

  • Derrick Sherrill
    December 22, 2020

    Hi Derrick, can you help me to automate copying table from Excel and paste it into mail and send

  • Derrick Sherrill
    December 22, 2020

    Hi Derick, can we write a code to separate a cell value which has delimiter into different columns.

    eg : cell A contains asds.dffd.234.12re

    Can this be split into 4 different columns using pandas? can we have a video on that?

  • Derrick Sherrill
    December 22, 2020

    I have a database of employee shift pattern and the shifts can be unique (5 shifts max as the employee can work 5 days a week). Below is my code which fetch results for an employee and update the tKinter boxes with those values ; however, the issue is that it adds curli braces({}) before and end of the string/answers… My question is that how can I remove those braces…? Someone told me to convert the list into set(), but a set consists of unique data and in my case data can be unique for a max of 5 days… Please help me how to remove those brackets. I have been working on this for last 6 hrs but no luck.

    def showRecord(self):

    connection = sqlite3.connect("../employee.db")

    connection.text_factory = sqlite3.OptimizedUnicode

    cursor = connection.cursor ()

    cursor.execute ( '''SELECT "Scheduled Shift" FROM employee_details WHERE Ecode = "5568328"''' )

    items = cursor.fetchall ()

    self.Employee1_FirstDay_ActualShift.set(items[0])

    self.Employee1_SecondDay_ActualShift.set(items[1])

    self.Employee1_ThirdDay_ActualShift.set(items[2])

    self.Employee1_FourthDay_ActualShift.set(items[3])

    self.Employee1_FifthDay_ActualShift.set(items[4])

    self.Employee1_SixthDay_ActualShift.set(items[5])

    self.Employee1_SeventhDay_ActualShift.set(items[6])

    connection.commit ()

    connection.close ()

    https://i.stack.imgur.com/wvoPZ.png (link to pic)

  • Derrick Sherrill
    December 22, 2020

    Thanks a lot Derrick. This saves a lot of manual hours for me.

  • Derrick Sherrill
    December 22, 2020

    Hi Derrick… Good Video. May i know how to name the excel sheets as shift [1,2,3] along with the filenames. Also I need only one column value mapped to this shift.

  • Derrick Sherrill
    December 22, 2020

    How to compare data in excel?

Write a comment