Necessary Tools¶
- Python in Excel
- Should be available in most versions of Excel
- Try typing
=PY
and hit tab
Python in Excel Review¶
1¶
Create a =PY
cell in D1 and type
df = pd.DataFrame({
'Student': ['Alice', 'Bob', 'Cesar', 'Dot', 'Eustace'],
'Score': [94, 65, 96, 78, 89]
})
df.sort_values('Score', ascending=False).reset_index(drop=True)
Open the card view. A useful view of the DataFrame is presented, but there are additional attributes available.
2¶
Type in cell G1
=D1.arrayPreview
The values from the DataFrame should spill to G1:I6
NOTE: If the DataFrame were larger, this would not include all the values. It only includes all the values if the DataFrame is small.
4¶
Now switch the output type of cell D1 from Python Object to Excel Value
The full DataFrame spills in D1:E6, and the card attributes are no longer available.
This highlights the benefits of both output types:
- Python Object can have additional attributes
- Excel Value makes the full data in the grid
The choice of which to use is up to you, but the best choice is often a Python dict
returned as an Excel Value.
Built-in Reprs¶
Excel has built-in handling of the following types:
- Lists / 1D arrays
- Lists of lists / 2D arrays / DataFrames
- Dictionaries
- Static images / Matplotlib charts
Let’s look at these before building our own custom reprs.
5¶
Create a =PY
cell in A10 and type
[1, 3, 5, 7, 9]
Open the card view and observe the List View of a card.
6¶
Create a =PY
cell in D10 and type
[[1, 2, 3], [4, 5, 6]]
Open the card view and observe the Array View of a card.
If you open the card view for D1, you will see a similar Array View with a header row.
7¶
Create a =PY
cell in G10 and type
{
"So": True,
"cherry": "pie",
3: 14,
"A": xl("A10#"),
"B": xl("D10#"),
}
Open the card view and observe each key: value pair. Notice that A and B simply show the type, not the actual values.
8¶
We can access these values as card attributes.
Type the following into the indicated cells:
- I10 =G10.So
- I11 =G10.cherry
- I12 =G10.3
- I13 =G10.A
- K10 =G10.B
We get the values out of the dictionary. Exciting! But not helpful for “A” and “B”.
9¶
Switch the output type of G10 to Excel Value.
Now we get the values we want in I13 and K10!
And G10 stays in a compact form. Best of both worlds. In fact, open the card view in G10 and view the upgrades.
We now have drill-down capabilities.
10¶
I personally think a Python dict in Excel Value output mode is the most useful return type.
- Compact form
- Access to full values
- Ability to return multiple values from a Python cell
I don’t necessarily like the icon or the “dict” label, but we can improve on that using Custom Reprs.
Images¶
Before we move on, let’s talk about images.
Excel can display static images in data cards and in cells.
11¶
Create a =PY
cell in A12 and type
df.plot(kind='bar', x='Student', y='Score')
The card view shows both the image and the size.
12¶
The image can be accessed as =A12.image
, but I find it most useful to convert to Excel Value.
When doing this, a new hover icon appears to Create Reference. This will create a floating image linked to the cell.
Custom Reprs¶
Now we will look at controlling how Python objects are rendered in Excel, both as an Excel Value and as a Python Object.
13 - Fresh Start¶
Let’s create a new sheet named “Custom Reprs” to keep our learning cleanly separated.
Question¶
What if your Python object is not one of the built-in types?
One approach is to manually return one of the built-in types as the last call in your cell.
14¶
Let’s look at how Excel handles Python’s complex
numbers.
Hint: it doesn’t
Create a =PY
cell in D1
c = 25 + 42j
The default entity card is used.
Can we access the real
and imag
pieces?
No. Excel doesn’t know anything about the complex
data type.
15¶
Switch the output to Excel Value.
Observe that we get an #N/A
error. Excel doesn’t know how to display a complex number.
16¶
Let’s help Excel out by giving it something it does know how to render.
c = 25 + 42j
[c.real, c.imag]
Now we get the complex number split into real and imaginary components in the grid.
Associate a type with a repr¶
Excel has a mechanism to associate any Python object with a repr function. Let’s create one now to handle complex
types.
NOTE: Ideally this would go in Initialization, but as it’s not editable, we will place this in cell A1.
18¶
Create a =PY
cell in A1
def _complex_value_repr(c):
return [[c.real, c.imag]]
excel.repr.register_repr_xl_value(complex, _complex_value_repr)
This creates a function to return the horizontal result we built previously. Then we register the function for the complex
type.
19¶
Change cell D1 so it returns a complex number
c = 25 + 42j
It now works because _complex_value_repr
is being called to render the object as an Excel value.
20¶
Let’s update how the Python Object is represented.
Go ahead and change **D1**
back to Python Object
output mode.
21¶
Edit cell A1
Add the following code:
def _complex_preview_repr(c):
return { 'real': c.real, 'imag': c.imag }
excel.repr.register_repr_xl_preview(complex, _complex_preview_repr)
Run the cell and watch D1 change to a dict
output with real
and imag
properties available.
Recap for Associating a type with a Custom Repr¶
- Define a function
- Register the function using
excel.repr.register_repr_xl_value(type, function)
for Excel Value outputexcel.repr.register_repr_xl_preview(type, function)
for Python Object output
- Note: Provide the Python type or class (ex. complex), not a string (‘complex’)
Alternate method for Classes¶
If you are creating your own custom Class, there are special methods _repr_xl_value_
and _repr_xl_preview_
which will achieve the same result without requiring you to register a repr function.
If you are writing your own class, this is the better approach.
22¶
Create a =PY
cell in A2
import math
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def distance(self):
return math.sqrt(self.x ** 2 + self.y ** 2)
def _repr_xl_value_(self):
return [['x', self.x],
['y', self.y]]
"Point class"
Note: It is good practice to return a string in cells where you are defining functions. Instead of showing “None”, it helps you remember what the cell is doing.
23¶
Create a Point
in cell A5
p = Point(3, 4)
Change the output type to Excel Value and see how the _repr_xl_value_
function controls how the object is spilled to Excel.
NOTE: Notice that I didn’t have to register this. Any class with this special function will use it to represent itself in Excel.
Custom Entity Cards¶
Up to this point, we have simply transformed a Python object into something Excel knows how to represent
- complex -> dict
- complex -> list
- Point -> list of lists
NOTE: We’re about to go deep!
Excel has a way to customize the Python Object
card view by returning a specially formatted dictionary.
Detailed information about entity cards: https://
24¶
Let’s create a custom entity card for complex numbers.
Update the preview code in cell A1
def _complex_preview_repr(c):
return {
"type": "Entity",
"text": f"complex({c.real}, {c.imag})",
"properties": {
"real": c.real,
"imag": c.imag,
},
}
excel.repr.register_repr_xl_preview(complex, _complex_preview_repr)
"complex repr"
After running this, notice that D1 looks much nicer.
And it still have real
and imag
properties!
Power Tip¶
For the preview repr, only 3 things are needed in the dict:
type
: str -- must be"Entity"
text
: strproperties
: dict
There are other ways to customize the output, but these are the minimum required for Excel to build a card view.
25¶
Why does imag
come before real
? Excel sorts properties alphabetically by default.
We can change the order by defining a layout
.
26¶
Let’s force real
to come before imag
.
Update cell A1
def _complex_preview_repr(c):
return {
"type": "Entity",
"text": f"complex({c.real}, {c.imag})",
"properties": {
"real": c.real,
"imag": c.imag,
},
#####################
# This part is new
#####################
"layouts": {
"card": {
"sections": [
{
"layout": "List",
"properties": ["real", "imag"],
},
],
},
},
######################
}
27¶
Let’s add some new properties.
We want:
vector
-- both real and imag as a tupleR
-- the length of the (real, imag) vectortheta
-- the angle in radians of the (real, imag) vector
Update cell A1:
import math # <-- add this
# ...
def _complex_preview_repr(c):
return {
"type": "Entity",
"text": f"complex({c.real}, {c.imag})",
"properties": {
"real": c.real,
"imag": c.imag,
"vector": (c.real, c.imag),
"R": abs(c),
"theta": math.atan2(c.imag, c.real),
},
"layouts": {
"card": {
"sections": [
{
"layout": "List",
"properties": ["real", "imag"],
},
{
"layout": "List",
"properties": ["vector", "R", "theta"],
},
],
},
},
}
28¶
Extract the vector
Field. One thing to notice is that these fields are normal Excel objects. They can be placed anywhere (doesn’t need to respect row-major order).
Also, you can mix them with other Excel formulas. Try this:
=TRANSPOSE(D1.vector)
Bonus - Icons!¶
Card views aren’t only for a Python Object
preview. You can also return them for Excel Value
output type.
There is one additional feature unlocked for Excel Value
-- you can specify a custom icon.
29¶
Let’s change the Point class Excel Value
to be a card view.
Edit cell A2. Replace the _repr_xl_value_
method.
def _repr_xl_value_(self):
return {
"type": "Entity",
"text": f"Point<{self.x},{self.y}>",
"properties": {
"x": self.x,
"y": self.y,
"length": self.distance(),
},
"layouts": {
"compact": {
"icon": "DataScatter",
},
"card": {
"sections": [
{
"layout": "List",
"properties": ["x", "y", "length"],
}
]
}
}
}
Now we have a fun icon in A5 so our Point object stands out!
Build Time!¶
30¶
Let’s create a Result
class which improves on the dictionary representation.
Challenge¶
- Create a class
Result
which takes- d
- name
- icon
- Add a
_repr_xl_value_
function which returns a card view using d, name, and icon - Add a
_repr_xl_preview_
function which calls_repr_xl_value_
- Create a
Result
in a cell and test it out
Data Conversion using xl()¶
Python in Excel has default conversion from Excel types to Python types when using xl()
, but this can be overridden.
31 - Fresh Start¶
Add a new sheet and rename it “Data Conversion”.
You should now have three sheets:
- Review
- Custom Repr
- Data Conversion (active)
Scalar xl() conversion¶
32¶
Copy the following and Paste Special -> Text into cell B5
TRUE
Hello PyData
-159
3.14159
5/5/2005
2024/01/01 13:14:15
0:01:59.4
Drag the formula to see how each Excel value is converted into Python.
NOTE: Even though Excel stores dates as numbers, the xl()
function looks at the formatting to infer dates and times.
34 - Don’t Try this at Home!¶
In general, don’t drag Python formulas like this
We’re doing it here so xl() has a single value for input, but normally for contiguous data you want to select the whole range and work with it as a single table.
Repeating Python formulas will slow down the execution significantly!
Range xl() conversion¶
When selecting a range, Excel will attempt to infer if the range has headers.
Ranges are always converted into pandas DataFrames. If no headers are provided, it defaults to numbered headers (0-based).
35¶
Paste Special -> Text into cell F5 the following
={"Student","Score"; "Alice",94; "Bob",85; "Cesar",92; "Dot",88;"Eustace",79}
36¶
Create a =PY
cell in I5.
Drag F5:G10. Expect Excel to write xl("F5:G10", headers=True)
in the formula bar.
Create another =PY
cell in I6.
Drag F6:G10 (don’t include the header). Excel now writes xl("F6:G10")
in the formula bar.
In both cases, the output should say “[PY] DataFrame”.
Custom xl array conversion¶
What if we don’t want a Range to automatically be converted to a pandas DataFrame?
We can provide a custom conversion from an Excel Range to a Python object.
37¶
Create a =PY
cell in A1
def our_custom_conversion(obj, headers=False, **kwargs):
"""Convert from Excel to Python
obj is a list of lists
"""
return np.array(obj)
excel.set_xl_array_conversion(our_custom_conversion)
"xl() conversion function"
Excel ranges will now be converted to a numpy array instead of a pandas DataFrame.
You can see that cells I5 and I6 now say ndarray
.
Just like custom output reprs, this conversion function affects all Python cells.
REMINDER: A good practice when definining functions or other things which return None
is to end with a string giving some indication of what the cell does.
38¶
What about scalar conversion?
Notice that cells in Column C did not change.
Scalar conversion can be overwritten using a different function:
excel.set_xl_scalar_conversion(func)
We’re not going to worry about that for now, but it’s good to know that it exists.
39¶
Now, can we make our conversion function even more flexible? Of course!
Let’s update our conversion to allow for:
- DataFrame
- ndarray
- dict
- list
Update cell A1 to read
def our_custom_conversion(obj, headers=False, type='DataFrame', **kwargs):
"""Convert from Excel to Python
obj is a list of lists
"""
if type in {None, 'DataFrame', 'pandas', 'pd', 'dataframe'}:
return excel.convert_to_dataframe(obj, headers=headers, **kwargs)
if type in {'ndarray', 'numpy', 'np', 'array'}:
if headers:
raise ValueError('Numpy does not support headers')
return np.array(obj, **kwargs)
if type in {dict, 'dict'}:
return {row[0]: row[1] for row in obj}
if type in {list, 'list'}:
if len(obj) == 1: # row-wise list
return obj[0]
else: # col-wise list
return [row[0] for row in obj]
raise TypeError(f'Unknown type: {type}')
excel.set_xl_array_conversion(our_custom_conversion)
"xl() conversion function"
We added a type
keyword, allowing us to specify what output type we want. Importantly, the default is the usual DataFrame
so we are adding extra functionality, not changing the default behavior.
40¶
Cells I5 and I6 should now say “[PY] DataFrame” again.
Modify cell I6 with type='array'
to convert it to a numpy array.
41¶
Create a =PY
cell in I7
xl("F6:G10", type='dict')
Now we have an easy way to convert two columns into a Python dict for each lookup tables.
42¶
Let’s verify the list behavior.
Create a =PY
cell in I8
xl("G6:G10", type=list)
Grabbing a list from a column is very nice rather than needing to extract it from a DataFrame.
43 - Discussion¶
Why create and register custom xl() loaders?
For a small amount of code in a workbook, probably not worth the effort.
For a larger workbook, if pandas DataFrame is not the right input format, a lot of code will be created simply to undo that step, complicating the analysis and making the real logic harder to follow.
WARNING: xl() is a MACRO, not a function¶
Magic is happening before your code is sent to Azure. Think of xl() as a pre-processing MACRO, not as a true function.
46¶
This will fail with a KeyError
from functools import partial
xlheaders = partial(xl, headers=True)
xlheaders("F5:G10")
47¶
The lesson is that while you can customize how xl()
is handled, it isn’t actually a true function, but rather acts like a pre-process MACRO.
Excel does some magic to resolve the reference, and the result is given as obj
to the conversion function. But don’t expect xl()
to be as flexible as normal Python functions.
This normally isn’t an issue, but is something to be aware of, just in case you see an unexpected KeyError.
Wrap Up¶
Hopefully you have some new (advanced) tools in your Excel toolbelt.
- Rewriting Python objects to known types with built-in Excel repr support
- Creating a custom card view for your Python object
- Modifying how Excel data is converted to Python via xl()