Skip to article frontmatterSkip to article content

3. Advanced Python in Excel

Authors
Affiliations
Microsoft
Anaconda

Necessary Tools

  • Python in Excel
    • Should be available in most versions of Excel
    • Try typing =PY and hit tab

Python in Excel Review

1

Create a =PY cell in D1 and type

df = pd.DataFrame({
    'Student': ['Alice', 'Bob', 'Cesar', 'Dot', 'Eustace'],
    'Score': [94, 65, 96, 78, 89]
})
df.sort_values('Score', ascending=False).reset_index(drop=True)

Open the card view. A useful view of the DataFrame is presented, but there are additional attributes available.


2

Type in cell G1

=D1.arrayPreview

The values from the DataFrame should spill to G1:I6

NOTE: If the DataFrame were larger, this would not include all the values. It only includes all the values if the DataFrame is small.


3

Type in cell G8

=D1.Python_type

This returns the Python string for the pandas DataFrame type.


4

Now switch the output type of cell D1 from Python Object to Excel Value

The full DataFrame spills in D1:E6, and the card attributes are no longer available.

This highlights the benefits of both output types:

  • Python Object can have additional attributes
  • Excel Value makes the full data in the grid

The choice of which to use is up to you, but the best choice is often a Python dict returned as an Excel Value.


Built-in Reprs

Excel has built-in handling of the following types:

  1. Lists / 1D arrays
  2. Lists of lists / 2D arrays / DataFrames
  3. Dictionaries
  4. Static images / Matplotlib charts

Let’s look at these before building our own custom reprs.


5

Create a =PY cell in A10 and type

[1, 3, 5, 7, 9]

Open the card view and observe the List View of a card.


6

Create a =PY cell in D10 and type

[[1, 2, 3], [4, 5, 6]]

Open the card view and observe the Array View of a card.

If you open the card view for D1, you will see a similar Array View with a header row.


7

Create a =PY cell in G10 and type

{
  "So": True,
  "cherry": "pie",
  3: 14,
  "A": xl("A10#"),
  "B": xl("D10#"),
}

Open the card view and observe each key: value pair. Notice that A and B simply show the type, not the actual values.


8

We can access these values as card attributes.

Type the following into the indicated cells:

  • I10 =G10.So
  • I11 =G10.cherry
  • I12 =G10.3
  • I13 =G10.A
  • K10 =G10.B

We get the values out of the dictionary. Exciting! But not helpful for “A” and “B”.


9

Switch the output type of G10 to Excel Value.

Now we get the values we want in I13 and K10!

And G10 stays in a compact form. Best of both worlds. In fact, open the card view in G10 and view the upgrades.

We now have drill-down capabilities.


10

I personally think a Python dict in Excel Value output mode is the most useful return type.

  1. Compact form
  2. Access to full values
  3. Ability to return multiple values from a Python cell

I don’t necessarily like the icon or the “dict” label, but we can improve on that using Custom Reprs.


Images

Before we move on, let’s talk about images.

Excel can display static images in data cards and in cells.


11

Create a =PY cell in A12 and type

df.plot(kind='bar', x='Student', y='Score')

The card view shows both the image and the size.


12

The image can be accessed as =A12.image, but I find it most useful to convert to Excel Value.

When doing this, a new hover icon appears to Create Reference. This will create a floating image linked to the cell.


Custom Reprs

Now we will look at controlling how Python objects are rendered in Excel, both as an Excel Value and as a Python Object.

13 - Fresh Start

Let’s create a new sheet named “Custom Reprs” to keep our learning cleanly separated.


Question

What if your Python object is not one of the built-in types?

One approach is to manually return one of the built-in types as the last call in your cell.

14

Let’s look at how Excel handles Python’s complex numbers.

Hint: it doesn’t

Create a =PY cell in D1

c = 25 + 42j

The default entity card is used.

Can we access the real and imag pieces?

No. Excel doesn’t know anything about the complex data type.


15

Switch the output to Excel Value.

Observe that we get an #N/A error. Excel doesn’t know how to display a complex number.


16

Let’s help Excel out by giving it something it does know how to render.

c = 25 + 42j
[c.real, c.imag]

Now we get the complex number split into real and imaginary components in the grid.

17

If we wanted it to spill horizontally, we could do that as well.

c = 25 + 42j
[[c.real, c.imag]]

Associate a type with a repr

Excel has a mechanism to associate any Python object with a repr function. Let’s create one now to handle complex types.

NOTE: Ideally this would go in Initialization, but as it’s not editable, we will place this in cell A1.

18

Create a =PY cell in A1

def _complex_value_repr(c):
    return [[c.real, c.imag]]

excel.repr.register_repr_xl_value(complex, _complex_value_repr)

This creates a function to return the horizontal result we built previously. Then we register the function for the complex type.


19

Change cell D1 so it returns a complex number

c = 25 + 42j

It now works because _complex_value_repr is being called to render the object as an Excel value.


20

Let’s update how the Python Object is represented.

Go ahead and change **D1** back to Python Object output mode.


21

Edit cell A1

Add the following code:

def _complex_preview_repr(c):
    return { 'real': c.real, 'imag': c.imag }

excel.repr.register_repr_xl_preview(complex, _complex_preview_repr)

Run the cell and watch D1 change to a dict output with real and imag properties available.


Recap for Associating a type with a Custom Repr

  1. Define a function
  2. Register the function using
    • excel.repr.register_repr_xl_value(type, function) for Excel Value output
    • excel.repr.register_repr_xl_preview(type, function) for Python Object output
  3. Note: Provide the Python type or class (ex. complex), not a string (‘complex’)

Alternate method for Classes

If you are creating your own custom Class, there are special methods _repr_xl_value_ and _repr_xl_preview_ which will achieve the same result without requiring you to register a repr function.

If you are writing your own class, this is the better approach.

22

Create a =PY cell in A2

import math

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    
    def distance(self):
        return math.sqrt(self.x ** 2 + self.y ** 2)
    
    def _repr_xl_value_(self):
        return [['x', self.x],
                ['y', self.y]]

"Point class"

Note: It is good practice to return a string in cells where you are defining functions. Instead of showing “None”, it helps you remember what the cell is doing.


23

Create a Point in cell A5

p = Point(3, 4)

Change the output type to Excel Value and see how the _repr_xl_value_ function controls how the object is spilled to Excel.

NOTE: Notice that I didn’t have to register this. Any class with this special function will use it to represent itself in Excel.


Custom Entity Cards

Up to this point, we have simply transformed a Python object into something Excel knows how to represent

  • complex -> dict
  • complex -> list
  • Point -> list of lists

NOTE: We’re about to go deep!

Excel has a way to customize the Python Object card view by returning a specially formatted dictionary.

Detailed information about entity cards: https://learn.microsoft.com/en-us/office/dev/add-ins/excel/excel-data-types-entity-card

24

Let’s create a custom entity card for complex numbers.

Update the preview code in cell A1

def _complex_preview_repr(c):
    return {
        "type": "Entity",
        "text": f"complex({c.real}, {c.imag})",
        "properties": {
            "real": c.real,
            "imag": c.imag,
        },
    }

excel.repr.register_repr_xl_preview(complex, _complex_preview_repr)

"complex repr"

After running this, notice that D1 looks much nicer.

And it still have real and imag properties!


Power Tip

For the preview repr, only 3 things are needed in the dict:

  • type: str -- must be "Entity"
  • text: str
  • properties: dict

There are other ways to customize the output, but these are the minimum required for Excel to build a card view.


25

Why does imag come before real? Excel sorts properties alphabetically by default.

We can change the order by defining a layout.


26

Let’s force real to come before imag.

Update cell A1

def _complex_preview_repr(c):
    return {
        "type": "Entity",
        "text": f"complex({c.real}, {c.imag})",
        "properties": {
            "real": c.real,
            "imag": c.imag,
        },
        #####################
        # This part is new
        #####################
        "layouts": {
            "card": {
                "sections": [
                    {
                        "layout": "List",
                        "properties": ["real", "imag"],
                    },
                ],
            },
        },
        ######################
    }

27

Let’s add some new properties.

We want:

  1. vector -- both real and imag as a tuple
  2. R -- the length of the (real, imag) vector
  3. theta -- the angle in radians of the (real, imag) vector

Update cell A1:

import math # <-- add this

# ...

def _complex_preview_repr(c):
    return {
        "type": "Entity",
        "text": f"complex({c.real}, {c.imag})",
        "properties": {
            "real": c.real,
            "imag": c.imag,
            "vector": (c.real, c.imag),
            "R": abs(c),
            "theta": math.atan2(c.imag, c.real),
        },
        "layouts": {
            "card": {
                "sections": [
                    {
                        "layout": "List",
                        "properties": ["real", "imag"],
                    },
                    {
                        "layout": "List",
                        "properties": ["vector", "R", "theta"],
                    },
                ],
            },
        },
    }

28

Extract the vector Field. One thing to notice is that these fields are normal Excel objects. They can be placed anywhere (doesn’t need to respect row-major order).

Also, you can mix them with other Excel formulas. Try this:

=TRANSPOSE(D1.vector)

Bonus - Icons!

Card views aren’t only for a Python Object preview. You can also return them for Excel Value output type.

There is one additional feature unlocked for Excel Value -- you can specify a custom icon.


29

Let’s change the Point class Excel Value to be a card view.

Edit cell A2. Replace the _repr_xl_value_ method.

    def _repr_xl_value_(self):
        return {
            "type": "Entity",
            "text": f"Point<{self.x},{self.y}>",
            "properties": {
                "x": self.x,
                "y": self.y,
                "length": self.distance(),
            },
            "layouts": {
                "compact": {
                    "icon": "DataScatter",
                },
                "card": {
                    "sections": [
                        {
                            "layout": "List",
                            "properties": ["x", "y", "length"],
                        }
                    ]
                }
            }
        }

Now we have a fun icon in A5 so our Point object stands out!


Build Time!

30

Let’s create a Result class which improves on the dictionary representation.

Challenge

  1. Create a class Result which takes
    • d
    • name
    • icon
  2. Add a _repr_xl_value_ function which returns a card view using d, name, and icon
  3. Add a _repr_xl_preview_ function which calls _repr_xl_value_
  4. Create a Result in a cell and test it out

Data Conversion using xl()

Python in Excel has default conversion from Excel types to Python types when using xl(), but this can be overridden.


31 - Fresh Start

Add a new sheet and rename it “Data Conversion”.

You should now have three sheets:

  • Review
  • Custom Repr
  • Data Conversion (active)

Scalar xl() conversion

32

Copy the following and Paste Special -> Text into cell B5

TRUE
Hello PyData
-159
3.14159
5/5/2005
2024/01/01 13:14:15
0:01:59.4

33

Create a =PY in C5 and type

val = xl("B5")
type(val).__name__

Drag the formula to see how each Excel value is converted into Python.

NOTE: Even though Excel stores dates as numbers, the xl() function looks at the formatting to infer dates and times.


34 - Don’t Try this at Home!

In general, don’t drag Python formulas like this

We’re doing it here so xl() has a single value for input, but normally for contiguous data you want to select the whole range and work with it as a single table.

Repeating Python formulas will slow down the execution significantly!


Range xl() conversion

When selecting a range, Excel will attempt to infer if the range has headers.

Ranges are always converted into pandas DataFrames. If no headers are provided, it defaults to numbered headers (0-based).

35

Paste Special -> Text into cell F5 the following

={"Student","Score"; "Alice",94; "Bob",85; "Cesar",92; "Dot",88;"Eustace",79}

36

Create a =PY cell in I5.

Drag F5:G10. Expect Excel to write xl("F5:G10", headers=True) in the formula bar.

Create another =PY cell in I6.

Drag F6:G10 (don’t include the header). Excel now writes xl("F6:G10") in the formula bar.

In both cases, the output should say “[PY] DataFrame”.


Custom xl array conversion

What if we don’t want a Range to automatically be converted to a pandas DataFrame?

We can provide a custom conversion from an Excel Range to a Python object.

37

Create a =PY cell in A1

def our_custom_conversion(obj, headers=False, **kwargs):
    """Convert from Excel to Python

    obj is a list of lists
    """
    return np.array(obj)

excel.set_xl_array_conversion(our_custom_conversion)

"xl() conversion function"

Excel ranges will now be converted to a numpy array instead of a pandas DataFrame.

You can see that cells I5 and I6 now say ndarray.

Just like custom output reprs, this conversion function affects all Python cells.

REMINDER: A good practice when definining functions or other things which return None is to end with a string giving some indication of what the cell does.


38

What about scalar conversion?

Notice that cells in Column C did not change.

Scalar conversion can be overwritten using a different function:

excel.set_xl_scalar_conversion(func)

We’re not going to worry about that for now, but it’s good to know that it exists.


39

Now, can we make our conversion function even more flexible? Of course!

Let’s update our conversion to allow for:

  • DataFrame
  • ndarray
  • dict
  • list

Update cell A1 to read

def our_custom_conversion(obj, headers=False, type='DataFrame', **kwargs):
    """Convert from Excel to Python

    obj is a list of lists
    """
    if type in {None, 'DataFrame', 'pandas', 'pd', 'dataframe'}:
        return excel.convert_to_dataframe(obj, headers=headers, **kwargs)
    if type in {'ndarray', 'numpy', 'np', 'array'}:
        if headers:
            raise ValueError('Numpy does not support headers')
        return np.array(obj, **kwargs)
    if type in {dict, 'dict'}:
        return {row[0]: row[1] for row in obj}
    if type in {list, 'list'}:
        if len(obj) == 1:  # row-wise list
            return obj[0]
        else:  # col-wise list
            return [row[0] for row in obj]
    raise TypeError(f'Unknown type: {type}')

excel.set_xl_array_conversion(our_custom_conversion)

"xl() conversion function"

We added a type keyword, allowing us to specify what output type we want. Importantly, the default is the usual DataFrame so we are adding extra functionality, not changing the default behavior.


40

Cells I5 and I6 should now say “[PY] DataFrame” again.

Modify cell I6 with type='array' to convert it to a numpy array.


41

Create a =PY cell in I7

xl("F6:G10", type='dict')

Now we have an easy way to convert two columns into a Python dict for each lookup tables.


42

Let’s verify the list behavior.

Create a =PY cell in I8

xl("G6:G10", type=list)

Grabbing a list from a column is very nice rather than needing to extract it from a DataFrame.


43 - Discussion

Why create and register custom xl() loaders?

For a small amount of code in a workbook, probably not worth the effort.

For a larger workbook, if pandas DataFrame is not the right input format, a lot of code will be created simply to undo that step, complicating the analysis and making the real logic harder to follow.


WARNING: xl() is a MACRO, not a function

Magic is happening before your code is sent to Azure. Think of xl() as a pre-processing MACRO, not as a true function.

44

This will fail with a KeyError

addr = "B6"
xl(addr)

45

This will fail with a KeyError

f'This is a string talking about {xl("B6")}'

46

This will fail with a KeyError

from functools import partial

xlheaders = partial(xl, headers=True)
xlheaders("F5:G10")

47

The lesson is that while you can customize how xl() is handled, it isn’t actually a true function, but rather acts like a pre-process MACRO.

Excel does some magic to resolve the reference, and the result is given as obj to the conversion function. But don’t expect xl() to be as flexible as normal Python functions.

This normally isn’t an issue, but is something to be aware of, just in case you see an unexpected KeyError.


Wrap Up

Hopefully you have some new (advanced) tools in your Excel toolbelt.

  • Rewriting Python objects to known types with built-in Excel repr support
  • Creating a custom card view for your Python object
  • Modifying how Excel data is converted to Python via xl()