People working as data analysts/scientists stumble upon requests for reports from different people. While ad-hoc reports are one time thing, reports that need to be sent periodically, be it daily or weekly or monthly, can take quite a lot of time if done manually. Hence, nn this tutorial, I will show how to automate excel reports using Python. The diagram below illustrates a typical excel report automation workflow.
Let us import the necessary libraries and establish a connection to our database with the necessary username and password. This can be done using the following block of code in the case of Oracle database.:
import cx_Oracle host = 'your-host-name' port = 'your-port-number-as-integer-without-quotes' sid = 'your-sid' tns = cx_Oracle.makedsn(host, port, sid) db = cx_Oracle.connect('your-username', 'your-password', tns)
Now that we have established a connection with the database, we will run the query using the
pandas library from a file and save it as a CSV file.
import pandas as pd from pathlib import Path query = Path('H:/periodic-analysis/analysis-query.sql').read_text() df = pd.read_sql(query, db)
If you have CSV or Excel file with your data already, you can also directly do:
import pandas as pd df = pd.read_csv('path/file.csv') ## or df = pd.read_excel('path/file.xlsx')
We have the data from our query or CSV/Excel file in a
pandas dataframe currently. We will save this data to a CSV file with the name
file_name = 'H:/periodic-analysis/latest_raw_data.csv' df.to_csv(file_name, sep = '|', index = False)
It is to be noted that this CSV file acts as the data source for the excel report. This tutorial assumes that you have already created a connection from your excel file to this raw data file.
Imagine you have the last excel report file that you had sent X days back to someone. You will typically create a copy of this file and then do the necessary changes. We will automate this step using Python as well.
This is a two step process: get the latest file in the directory and create a copy of the latest file in the same directory.
import glob import os list_of_files = glob.glob('H:/periodic-reports/final-reports/*') latest_file = max(list_of_files, key = os.path.getctime) latest_file = latest_file.split('2018', 1) latest_file = '2018' + latest_file
from shutil import copyfile import datetime today_date = datetime.datetime.today().strftime('%Y%m%d') src = 'H:/periodic-reports/final-reports/' + latest_file new_file = today_date + '_latest_periodic_report.xlsx' dst = 'H:/periodic-reports/final-reports/' + new_file copyfile(src, dst)
Let us assume you already had a report being sent often that is computed out of this raw CSV data file that we have generated earlier. Many people I have met think, at this step, they must open Excel, click the refresh button, and save the file. What if I tell you that even this step can be automated?
Let us the take the new file we created out of the old file and refresh it using Python itself.
import win32com.client import time SourcePathName = 'H:/periodic-reports/final-reports/' + new_file Application = win32com.client.DispatchEx('Excel.Application') Application.DisplayAlerts = False Application.Visible = 1 Workbook = Application.Workbooks.open(SourcePathName) Workbook.RefreshAll() time.sleep(20) Workbook.Save() Workbook.Close() Application.Quit()
The report is now ready. However, there is an additional step. We may want to notify the concerned people that the report is ready and is stored in XYZ shared location. In this case, the shared location is a Windows drive
H:. This can vary depending on how the shared network at your end looks like. You can send emails using Python by writing the following block of code:
import smtplib from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText fromaddr = 'firstname.lastname@example.org' toaddr = 'email@example.com' body = 'The ABC analysis report has been updated and saved in XYZ location' msg = MIMEMultipart() msg['From'] = fromaddr msg['To'] = toaddr msg['Subject'] = 'Periodic Analysis Report' msg.attach(MIMEText(body, 'plain')) server = smtplib.SMTP('smtp-name', 'port-number-without-quotes') server.starttls() server.login(fromaddr, 'password') text = msg.as_string() server.sendmail(fromaddr, toaddr, text) server.quit()
This can be considered the last step of this tutorial where you will schedule the entire process until now to run, update and send the email by itself. There are many ways to schedule tasks depending on which environment you are in. However, I will demonstrate a Pythonic way to schedule tasks using a Python library called schedule by Dan Bader.
Let us now assume that we need to send our Excel report every Monday at 12 PM. The block of code will look like:
import schedule import time def generate_weekly_report(): # The block of code from querying until sending email notification schedule.every().monday.at('12:00').do(generate_weekly_report) while True: schedule.run_pending() time.sleep(1)
Save the entire code in a file and run it from a Python terminal. You have now learnt how to refresh an Excel report by automating it using Python.