In this section of the tutorial, we will learn how to work with spreadsheets using the openpyxl
library in Python. The corrisponding slides for this section are available on the repo.
openpyxl
is a powerful library that allows you to read, write, and manipulate spreadsheet files in the Excel 2010 xlsx
/xlsm
/xltx
/xltm
format.
There are a few other options that might address other usecases:
pylightxl
if you need just cell data and want a lightweight, pure python libraryxlsxwriter
another one of the engines that Pandas can use to read/write Excel files
N.B. OpenPyXL needs
pillow
to work with images in the spreadsheets, so make sure to install it in your environment if you are not working in the dev container!
Loading data from a spreadsheet¶
from openpyxl import load_workbook
wb = load_workbook(filename = 'sample-data/iris-data.xlsx')
print(wb.active['g3'].value)
wb.sheetnames
<openpyxl.worksheet.formula.ArrayFormula object at 0x7f9c9f86b750>
['Iris']
N.B. openpyxl does currently not read all possible items in an Excel file so shapes will be lost from existing files if they are opened and saved with the same name.
Again, some potentially useful options to load_workbook
are:
data_only
controls whether cells with formulae have either the formula (default) or the value stored the last time Excel read the sheet.keep_vba
controls whether any Visual Basic elements are preserved or not (default). If they are preserved they are still not editable.read-only
opens workbooks in a read-only mode. This uses much less memory and is faster but not all features are available (charts, images, etc.)rich_text
controls whether any rich-text formatting in cells is preserved. The default is False.keep_links
controls whether data cached from external workbooks is preserved
from openpyxl import load_workbook
wb = load_workbook(filename = 'sample-data/iris-data.xlsx', data_only=True)
print(wb.active['g3'].value)
#VALUE!
Creating a basic spreadsheet from scratch¶
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("sample-data/scratch.xlsx")
Creating a fancier workbook¶
Let’s try and make a spreadsheet now from python that has some fancier formatting and a basic chart.
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
card_data = [
["Name", "Mana Color", "Cost"],
["Lightning, Army of One", "Red, White", 2],
["Traveling Chocobo", "Green", 3],
["Y'shtola Rhul", "Blue", 6]
]
Next we’ll enter this data onto the worksheet. As this is a list of lists, we can simply use the Worksheet.append()
function.
for row in card_data:
ws.append(row)
Now we should make our heading Bold to make it stand out a bit more, to do that we’ll need to create a styles.Font and apply it to all the cells in our header row.
from openpyxl.styles import Font
ft = Font(bold=True)
for row in ws["A1:C1"]:
for cell in row:
cell.font = ft
We can also use built-in worksheet functions that OpenPyXL supports. Doing things this way means if someone manually edits the spreadsheet it will automatically update values.
from openpyxl.utils import FORMULAE
list(FORMULAE)[:15]
['PEARSON',
'IMEXP',
'IMSUM',
'COUNTIFS',
'IMSUB',
'CUBEMEMBERPROPERTY',
'MOD',
'ERFC',
'BAHTTEXT',
'DOLLARFR',
'TRANSPOSE',
'CHITEST',
'QUARTILE',
'FACTDOUBLE',
'LEFT']
ws["C5"] = "=SUM(C2:C4)"
It’s time to make some charts. First, we’ll start by importing the appropriate packages from openpyxl.chart
then define some basic attributes
from openpyxl.chart import BarChart, Series, Reference
chart = BarChart()
chart.type = "col"
chart.title = "Card CMC Cost"
chart.y_axis.title = 'Mana Cost'
chart.x_axis.title = 'Card Name'
chart.legend = None
That’s created the skeleton of what will be our bar chart. Now we need to add references to where the data is and pass that to the chart object
data = Reference(ws, min_col=3, min_row=2, max_row=4, max_col=3)
categories = Reference(ws, min_col=1, min_row=2, max_row=4, max_col=1)
chart.add_data(data)
chart.set_categories(categories)
# Finally we can add it to the sheet.
ws.add_chart(chart, "E1")
wb.save("card_data.xlsx")
There are a lot more options here for building out your custom worksheets, including adding images, shapes, and more. You can find the full documentation for OpenPyXL here.