Bruno J. Navarro: By now, most everyone has heard about the potential benefits of artificial intelligence and generative AI. But for the growing number of business executives who are interested in the technology for its value creation possibilities or the promise of increased productivity, the first question might very well be, “Where do I start?” Today on the Workday Podcast, we’ll start to answer that question by speaking with Michael Schrage, a research fellow at the MIT Sloan School of Management Center for Digital Business, and Stacy Hilgendorf, chief accounting officer and corporate controller at Sprouts Farmers Market. Welcome.
Michael Schrage: Hi.
Navarro: Thanks for being here, Michael. You’ve spoken about the potential benefits of AI in revamping traditional business metrics, such as KPIs. Can you talk about how executives should be thinking about implementing that technology within their organizations?
Schrage: Well, first and foremost, thank you for inviting me for this opportunity to talk about such an important topic and theme. Unfortunately, you began by asking me a trick question because when you say AI, that phrase, those two words mean something dramatically different now than they did even a year, a year and a half ago. The rise of generative AI, the rise of large language models, as exemplified by OpenAI’s ChatGPT-4 and Anthropic’s Claude, Claude 3 Opus, really has changed the game in regard to AI capability and purpose. When I was studying AI as an undergraduate and graduate student, the real focus was on rules-based artificial intelligence. That is to say, reason and intelligence was a function of following and applying certain kinds of rules.
With large language models, things are completely different. You have neural nets instead of rule sequences. You train models of neural nets with literally, literally billions of parameters. And the mechanisms are different: attention as a mechanism, self-attention as a mechanism. I don’t want to go into the gory details of this, but in the same way we talk about correlation, causality, directed acyclic graphs, and all manner of algorithmic and computational techniques, large language models fundamentally rely on a set of techniques that emulate and mimic how, frankly, how minds work in unusual ways. And this has really changed and should change businesses’ expectations of what AI means, AI’s capabilities, and how they can get value from those capabilities. So I think it’s a very big deal. I think, a year ago, this was a-- this was a nice-to-have discussion. I think, in the next year, it’s going to be a need-to-have discussion, and not just for the technologists and IT in business, but for the strategists, the marketers, the HR people, and the CFOs.
Navarro: That’s fascinating. Today, you led a prompt-a-thon at Workday Financials Product Advisory Council, a group of key customers across a variety of industries running Workday solutions for the office of the CFO. You’ve led a number of these prompt-a-thons in your role as an instructor at MIT’s Sloan Executive Education Program. What exactly is a prompt-a-thon, and why should executives participate in these?
Schrage: Well, first, I want to pay Workday the compliment for having the gumption to do such a prompt-a-thon because it’s a bit of a risky sort of thing to do with some of your best customers. But in fact, now that we’ve done it, it was a good idea before, a good idea during, and a good idea after. Here’s why. Because prompts are one of the ways you deal with, you interface with, engage with the capabilities of large language models, whether that be a ChatGPT or a Claude. Prompts are a way of literally and figuratively prompting the model to answer questions, to think out loud, to create and explore options, to generate scenarios, etc. And it’s very, very clear that prompts and large language models require a different kind of engagement and a different kind of emphasis, in particular, in the finance community. So instead of just looking at an Excel spreadsheet or, indeed, workfaces-- Workday’s own cloud capabilities, you have to say, "Gee, how would generative AI capabilities add value to complement, reinforce, make us rethink some of the assumptions associated with the way we’re managing financial assets right now?"
So in the same way that hack-a-thons invite different groups of people to hack together a solution to something, the idea behind a prompt-a-thon, which is somewhat shameless, is it’s hacking prompts. It’s getting your customers, Workday’s customers, to collaborate together on use cases and ask themselves, "Gee, we have this generative capability, and we also know what Workday can do. How can we mash them up in creative ways? How can we learn to prompt in a way that creates prompts that learn? How do we create a virtuous cycle of learning and prompting?" And that was the goal. And I think we were fairly successful because people came up with unusual use cases. They came up with novel and, in some cases, provocative prompts to explicate or illuminate or drive that use case. And the responses that we got—and today, we used Anthropic’s Claude—the responses we got made everybody in the room, in my professional opinion, think twice about, "This is a capability we’re probably going to be spending more time with over the next 12 to 18 months rather than less time with."
Navarro: That seems to make data an ever important component.
Schrage: More important than ever. LLMs, the acronym GIGO, garbage in, garbage out, has never been more relevant as AI rises in importance and impact. These models are trained. What’s the data that trains them? How do we make sure that the data quality, data lineage is appropriate, it’s fit for purpose for the kind of problem we seek to solve, or the kind of opportunity we seek to address, or the kind of compliance we wish to honor in a regulated environment, or the sort of forecast we want to make to better cope with the vagaries of an uncertain future? So data becomes more important in this environment.
But there is one other thing I want to say, which is: As important as data is, even as we improve the power of “artificial intelligence,” it means that human beings’ own critical thinking skills become more important. And if you were to ask me one of my undisclosed hopes of running the prompt-a-thon, it’s not just that people would learn and have a good time. It’s that they would recognize they have to think twice about the importance of critical thinking, that critical thinking is not just about, "Here’s the problem. How do we solve it?" or even, "How do we solve it better?" It’s, "What’s the real problem that we’re trying to solve? Is our definition of the problem and the problem space accurate? And if it’s not, are we thinking more rigorously about the fundamentals and assumptions that might go into a better approach to addressing it?"
Navarro: It sounds like the idea of garbage in, garbage out, is even more important when it comes to financial practitioners.
Schrage: Absolutely.
Navarro: So how did you think about coaching them to think about risk and compliance issues associated with generative AI, both from a use case and a prompt perspective?
Schrage: We did not focus on the data aspect. What we were really trying to do was to get people to push the boundaries, push the envelope of generative capabilities, i.e., this isn’t just about how do you compute something. It’s: How do you engage with an entity, with a software, with a model, with a model of human intelligence in a way that it can give you insights that you wouldn’t get from a spreadsheet, that you wouldn’t get from an SQL, SQL query, or a Google search, or even Workday’s own adaptive frameworks? We’re talking here about a valuable new addition to the digital ecosystem for organizations that care about the management of tangible assets, intangible assets, and all forms of capital, human to financial.
Navarro: It’s so interesting that you bring up the idea of, “How do you engage?” What are some best practices you can share on how to develop prompts that deliver better outputs for the business?
Schrage: That’s a great question. And that, to my mind, was one of the most important, obvious rediscoveries that people who participated made. They’re called LLMs, large language models. The words you choose become disproportionately more important, just like when you’re talking with a friend of yours or somebody that you care about. For somebody who’s very, very familiar with you, saying, “How are you doing?” the historical context is sort of implicit and understood. In this case, in a prompt-a-thon, you want to be really, really intentional. I use the phrase thinking twice. You want to be really, really intentional about what you want the virtue and value of your engagement to be.
When you say, “Give me five ideas,” or, “Recommend five good ways of doing X,” perhaps you might want to say, “Recommend five good ways of doing X that address this strength and minimize my exposure to this risk.” “Give me the opportunity to do something much quicker than I ordinarily do.” Ordinarily, this task takes me 20 minutes. “Give me three suggestions that might cut the amount of time I spend on this task in half, or by two-thirds, or by 80%.” What’s the summary and synthesis of this? Use language to be specific, explicit, granular, detailed in a way that makes it easy for the model to do what it does best: pattern-match and extract real signal from literally billions of parameters and trillions of pieces of data.
Navarro: So the human aspect of that isn’t going away anytime soon, it seems.
Schrage: I would argue just the opposite. I think that the human aspect becomes more important. That said, because prompts are so important, because prompt sequences are so important, because prompt templates and formats are so important, what are we seeing? We’re seeing AI techniques being applied to automating prompt generation. So what you have is an ongoing virtuous cycle and tension between where do we get value from using technology to augment, enhance human decision versus where do we get better return, greater efficacy, greater effectiveness from automating these things. And large language models are capable of adding value to both of those. Large language models can help you become more mindful. They can also facilitate your becoming mindless.
Navarro: Well put. Thinking about people who are adopting AI at this point or looking into it, is there a guiding principle that they should be thinking about? I mean, we talked a little bit about data. We understand a little bit about the human element, but what should finance practitioners come away with?
Schrage: So there really are two key things that I would emphasize. In fairness, they’re not explicitly unique to AI in general or generative AI in particular. The first is a standard question. Not what do you want to do, but what do you want the outcome to be? What is your desired outcome? Not just the output, the outcome. What do you wish to accomplish? What do you wish to achieve? Generative AI, like AI, is not an end or a goal. It’s a means to an end. So what I push my classes and my clients to emphasize is, what do you want from the outcomes? What are the most desirable outcomes and why? The second, to put some discipline on this, is what are the use cases that are best connected to or connect best to these desired outcomes? And I believe the effective application of AI capabilities and generative AI capabilities are, “Let’s see where these use cases can get the greatest value from the focused application of generative AI and how that focused application of generative AI in that use case leads to or contributes to the desired financial or business outcome.” That’s where I would begin. How do we define and navigate desired outcomes, explicit, specific, granular use cases?
Navarro: We’ve heard some about AI having hallucinations. How do we avoid that pitfall, or how do finance professionals avoid that?
Schrage: I am a contrarian on the hallucination front. I think oftentimes “hallucinations” materialize because bluntly, the prompts, the queries invite misrepresentation and misunderstanding. We’re right back to the language issue here. The other has to reflect the, dare I say it, etiology and basis upon which these large language models are trained. They’re simulations. Virtually everybody has seen a docu-drama where the disclaimer is, “Based on a true story.” Large language model responses are based on true training, but the response oftentimes is a pattern match. It’s looking for similarities, so it’s similar, but not quite. It’s a not quite identical twin, or it’s like an actor playing somebody else, and for reasons of convenience and rhetoric, we call that hallucination. I would call it a simulation that’s out of variance, out of bounds, out of the guidelines that you want. In the same way, if you run a simulation and it exceeds certain parameters that you would expect, that’s the nature of simulation. So where people say that the model is making things up, it’s like, “Well, actually, it’s been trained to generate things based on how it’s been trained.” It’s not making things up. It’s based on your query. This is a credible response consistent with how it’s trained.
By the way, the complicating factor is these are enormous models, and so what’s the classic AI stuff? Transparency, interpretability, and explainability. Guess what? If you’ve got a large language model trained on over a trillion bits of information with nine billion parameters, its interpretability, explainability, and transparency is, shall we say, challenging for the ordinary human mind.
Navarro: That’s exactly what I was going to say. I was going to say it’s sort of hard to imagine the amount—
Schrage: I would flip it. You have to imagine. All that’s left is imagination because human beings weren’t bred to handle that. I’m talking about information overload. Oh my gosh, it’s astonishing. It’s absolutely astonishing.
Navarro: You mentioned earlier that even the concept of AI and LLM has evolved over the past year, that we’re in a different place now than we were a year ago. What do you see for the next coming year? What capabilities do you foresee being developed or being put into use or being developed?
Schrage: So there are three big areas that I think are going to have—I’ll stick my neck out—enormous impact going forward. The first is the models—and remember, the best people in the world aren’t exactly certain how and why the models work as well as they do. So let’s just be clear here that what is called best practice is contingent on the moment, and this is constantly changing. So the first is the models themselves improve, that ChatGPT-5 is coming out and we have yet to see how much better it is than ChatGPT-4, and whether it becomes more trainable, less trainable, learns faster, learns better, has emergent properties, is able to mimic logic and reason in ways that its predecessors could not because its architecture and training facilitate and enable that. So number one, the models themselves improve.
Number two, these models, like everything else in the digital world, can connect to other things. There’s connectors, their APIs. Well, what happens when we start connecting these large language models to SQL databases, relational databases, different kinds of computer programs, different kinds of Python, or Microsoft have different kinds of Copilots? They’re part of an ecosystem. Well, as we know in ecosystems, there are network effects, and the whole is greater than the sum of the parts. So as the capability of generative changes, the quality and capability of the ecosystem changes. That’s not predictable at this time. But clearly, as generative capabilities improve, the things in that ecosystem might also improve. Are these effects going to be additive or multiplicative? Who knows? My bet, more multiplicative than additive.
And the third is I got this huge, really, really, really smart, large language model. What happens when large language models start to talk to one another? They’re separate and distinct, but what if they begin training each other? We’ve just begun that. What happens when ChatGPT-5 has an argument or a discussion with Claude 4? I have no idea. And I’ll tell you what, neither do the folks at OpenAI or Anthropic. Oh, and I’ve left out Llama 2, or Llama 3, the open-source stuff from Facebook. So you have this constellation of large language models that may actually learn from and with each other and for each other. So those three variables are huge, and I believe they’re going to lead for the foreseeable future to ever greater capabilities in the generative space.
Navarro: You sound like an optimist.
Schrage: Technologically, I’m an optimist, yes. By the way, I am old enough to remember when Gordon Moore, the late Gordon Moore, of Intel, one of the three cofounders of Intel, published an IEEE spectrum, Moore’s Law, which basically said that chip density doubles every 18 months. And that was 30-some years ago. That’s exponential growth. He was right. He was right. Intel wasn’t the winner, didn’t become the trillion-dollar company. NVIDIA became the trillion-dollar company, and AMD, but it hewed to Moore’s law. We’re seeing that same kind of exponential imperative in large language models. So, technologically, I remain an optimist. On other aspects of life, I don’t think you would consider me an optimist, but that’s not for this podcast.
Navarro: That’s a great answer. So it sounds like people wanting to get in on AI and LLMs are trying to jump on a moving train a little bit.
Schrage: That’s exactly what they’re doing, yes, or to mix the metaphor, a plane that’s taken off. And it’s not clear whether it’s a jet plane, an SR-177 Blackbird, or one of Elon Musk’s launch vehicles.
Navarro: So given all that, how do you prepare a workforce? How do you prepare finance workers to develop the skills to interact with AI?
Schrage: And I want to be really, really careful how I say this because I don’t want it to be politically misconstrued. Less than a decade ago, people were losing their jobs in blue-collar work, in journalism, and the snotty response was, “Learn to code.” Now, we have Copilot. We have LLMs that actually are being used to help people code, and there’s a number of folks at places like MIT and Stanford who believe that the need for coders is going to decline. And indeed, we’ve seen non-trivial layoffs in the digital sector. This is not meant to be a political observation. It’s like, “Well, gee, they need to learn generative AI, or they need to learn to train LLMs.”
I’m going to stick with something that I said to you earlier. I believe that the ability to have and cultivate critical thinking skills and communicative skills, both with people and machines, becomes indispensable. You need to boost your critical thinking skills and be able to review and analyze and challenge the outputs of a large language model. You need to be able to collaborate with that large language model, not just tell it what to do. And you need to have the courage and the judgment to recognize, “Here’s where I should be telling the model what to do, and here’s where I should follow the recommendations of the model,” much the way we already have in aviation and fighter jets and commercial aviation cockpits, where the planes, the automatic pilot, the copilot, plays a huge role in helping the pilot fly the plane. And so the way I would encourage large organizations to deal with this is not training people in coding or training people in math. It’s, “What do we want critical thinking skills to mean as these capabilities increase digitally?”
Navarro: Thank you so much, Michael. Next, we’re going to bring in Stacey Hilgendorf, chief accounting officer and corporate controller at Sprouts Farmers Markets. Welcome.
Hilgendorf: Hi, thank you.
Navarro: Thanks for being here. Can you tell us a little bit about Sprouts Farmers Markets and your role there, Stacy?
Hilgendorf: Yeah, absolutely. Sprouts is a unique, specialty, small-format grocer focused on healthy, innovative food, primarily very focused on produce, meat, and fresh stuff but lots of interesting grocery items as well. We’re located in about 23 states with a little over 400 stores. My role there, as you mentioned, is chief accounting officer, and I have responsibility for all that involves accounts payable and payroll and tax, general ledger, and those types of things.
Navarro: Great. So you participated today in a prompt-a-thon on how to curate the right AI use cases and prompts to deliver better business outcomes. I love the idea that a purveyor of natural and organic foods is looking to leverage AI and ML more effectively as part of your growth strategy. How did today’s session affect your comfort level with generative AI and its role among finance teams?
Hilgendorf: Yeah, it absolutely helped me get more comfortable. I’ll be honest, my exposure to AI, I think, up until today, or certainly recently, has been limited, and just kind of a lot of things that you just hear: AI is going to automate this or automate that or eliminate this or eliminate that. And I didn’t really understand what the practical uses were. And I think today helped bring that home quite a bit, particularly with the exercise we did kind of going through some real live examples and practical applications.
Navarro: Could you share some of the best practices you picked up on on how to curate the right prompts to develop better outputs and guidance from large language models?
Hilgendorf: I think probably my biggest takeaway today had to do with how you formulate the question or the inputs that you would use to get the output that you want. In particular, probably the number one was being specific. The more specific and clear you can be on what you’re asking for, I think the better answer you’re going to get coming out the other side. And that makes sense. You hear that all the time. So, again, the examples we used in the session earlier I thought were really useful. And then being able to see those results come out right in front of us, very informative. Just really helpful to see, “This is what they put in, and here’s what it gave out.” And you can see what the limitations are and then link that back to what you asked it for and kind of see where maybe you left a gap that it didn’t close all the way.
Navarro: You sound enthusiastic about the technology.
Hilgendorf: Well, I think for the first time in this session, these last two days, and then in a different conference I went to a couple of weeks ago where I spoke to another AI person, for the first time, those two examples or situations gave me real practical use cases, situations where I could see, “Hey, I could use this,” and not off into the distant future, but I could use it now, in certain scenarios. And I can certainly see a way where there’s a path to a lot more use sooner than I thought.
Navarro: With this enthusiasm that you’re exhibiting, how are you thinking about preparing your team to start using AI and LLMs?
Hilgendorf: That’s where it gets complicated, right? But in a word, I would say training. And again, similar to what I just said in my own situation, it feels to a lot of people right now probably like a big black hole of, what is AI? You hear these terms. You see stories on the news. You hear a lot of hype, let’s say. I think you have to train people into what it really is, what it really can do. In my situation, to take that a step further, part of my group is accountants that do a lot of—process a lot of high-volume transactions, payroll and AP in particular. And I think those folks in those roles, like any other, have a little bit of fear because they don’t know what that means for their job. And I think there’s a lot of training and education to be done there to carefully move into that world and make sure those folks are treated in the right way. They’re brought along with that process and not eliminated from it. They’re trained, they’re educated, and shown how they can learn it and get value out of it in the jobs they do, and then they can, in turn, use that value to add value to the company by being better and more efficient at what they do.
Navarro: How much does comfort with technology play a role in hiring finance practitioners in your organization?
Hilgendorf: Good question. It depends on the job. I mean, again, there are some roles that just lend itself a lot more to a high-technology experience. But these days, every role, I mean, even the ones I just described, the processing folks, obviously, they’re using technology. I mean, they’re the big users of Workday. They’re the ones that are in it almost every minute of the day. And so they certainly need to be comfortable jumping around, linking data, reconciling, getting two different systems to talk to each other, probably more than anyone else. But yeah, I don’t think there’s anyone who can’t be touched by a high level of technical need in the role.
Navarro: It sounds like there’s a little bit of room for elevating some of these finance roles from just doing manual processes.
Hilgendorf: I think they’re already doing a lot of it with the technology that’s already baked into Workday, character recognition, and automatic linking of POs to invoices, and things that already occur. But yeah, the more we move toward AI, the more that’ll just increase and accelerate.
Navarro: One of the use cases you worked on was around contract generation and analysis and the role of machines versus humans in ensuring compliance and mitigating risk. How important is it to you to know that Workday’s approach to AI is human-centric and that humans are the ultimate decision-makers in these kinds of instances?
Hilgendorf: That links right back to what we just talked about, right? I mean, I think if you look at this as anything other than human-centric, you’re starting off on the wrong foot. I mean, you have to keep the people in mind. That’s what the core of the company is, is the people that do the work, the people that sell the product, the people that interface with the customers, top to bottom. So you have to start with that and go from there. And again, involves a lot of training, education, and thoughtful change management.
Navarro: And does your company have an approach or have you thought about the implications of ethical AI and responsible AI and how you implement the technology within your organization?
Hilgendorf: The short answer is probably no, not enough. It’s not something we haven’t thought of, but it’s definitely not anything we’ve arrived at answers to. And I don’t think-- my company has probably not had those conversations at the right levels and with the right people in the room yet to sort of figure it out. But we’re getting to where that needs to be the next step for sure.
Navarro: Looking forward and taking what you’ve learned today during the prompt-a-thon, are there sort of guiding principles that you’ve thought about or are thinking about implementing as you take this technology to your team and, I guess, start to encourage use of it, what sort of tasks you’re looking at employing AI or automation?
Hilgendorf: I’m in the infant stages of this, right? But I think there’s room for some high-level usage. The use case that my group talked about had to do with analyzing data around investor relations and how to deal with financial analysts that follow our company on Wall Street, etc., and using historical data plus predictive data to kind of understand trends and questions and anticipate what those conversations would look like. But at a more core level, I think there’s a lot of uses around automating reconciliations and resolving outliers, in terms of payments, invoices not matching POs, and short shipments not matching the invoice. And there’s a lot of that, room for lots and lots of that stuff in our group. Outliers in terms of looking at huge volumes of financial data and seeing what’s unusual in the month.
Does that mean you double-booked a journal entry or missed a journal entry? Does that mean that some activity spiked that you need to investigate or tell someone else to investigate? Maybe it’s your electrical usage and I need to talk to the ops guys about that, or maybe we paid the bill twice and need to talk to the AP team about that. But I think there’s countless uses where you can apply that to very practical day-to-day functions right now that we have to rely on our internal controls and the systems we do have to try to run into that. And I think there’s some ways that we can be led to it a lot easier.
Navarro: And in your organization, who are the ultimate decision-makers? Who’s the human element as the last line of defense when it comes to these automating processes and employing new technology?
Hilgendorf: Certainly the CFO—our CFO would have to kind of have the final say. But he’ll rely on his team, myself included, to bring him recommendations and help educate him on the ways I think it could be practically applied. And I’m sure it’ll be a joint effort when we start crossing those lines into how to apply it and how to use it and the decisions that have to be made around how to put it into use.
Navarro: As chief accounting officer and corporate controller, what concerns did you have about the technology before today?
Hilgendorf: I think, like most people probably have, it’s the ethics and the privacy, for sure. And Sprouts is a public company, so our data has to stay private and secure until we make it public. And when we make it public, obviously, there’s avenues for that. So I can’t say I understand enough about the AI vendors. You hear about ChatGPT and some of the others. And now Workday is clearly moving into the space. So I’m not sure what the privacy is. If I go on to ChatGPT today and I put something out there that’s Sprout-specific, is that available in the public domain because I put it out there as a question? But it was nice to hear that Workday seems to be entering into that world carefully. And I think that is the right way to go. I mean, the applications that we talked about with respect to broader uses, like I said, with our earnings calls, for example, versus the uses within our back office to help resolve reconciliations or variance problems, the latter is going to be very tied into the Workday data. And to know that that data is secure will help us be able to move into that space a lot quicker if we choose to do that and however we choose to do that. But we’ll feel a lot better about it knowing that the data’s safe there.
Navarro: So it sounds like today put your mind at ease a little bit.
Hilgendorf: Again, I think there’s probably a thousand questions, but before today, there were probably 1,500. I mean, I’m making up numbers, but certainly, it’s a move in the right direction to help feel that Workday is going to partner with us to understand those risks and is going to work with us to make sure that it all stays secure and usable.
Navarro: Great. Well, Stacey, that’s about all the time we have for today. We’ve been speaking with Michael Schrage and Stacey Hillgendorf.
Remember to follow us wherever you listen to your favorite podcasts. And remember, you can find our entire catalog at workday.com/podcasts. I’m your host, Bruno Navarro, and I hope you have a great Workday.