Last month, OpenAI launched its latest AI-powered chatbot product, GPT-4. According to the folks at OpenAI, the bot, which uses machine learning to generate natural language text, passed the bar test with a score of 90.y percentile, have passed 13 of 15 AP tests and had a near-perfect score on the GRE verbal test.
The inquiry wanted the brains at BYU and 186 other universities to know how OpenAI technology would fare on accounting exams. So, they put the original version, ChatGPT, to the test. Researchers say that while it still has a lot to do in accounting, it will change the way we teach and the way everyone learns — for the better.
“When this technology first came out, everyone was worried that students could now use it to cheat,” said lead study author David Wood, a professor of accounting at Brigham Young University. “But the opportunities for cheating have always been there. So for us, we’re trying to focus on what we can do with this technology now that we couldn’t do before to improve the teaching process for faculty and the learning process for students. The testing has been amazing.”
Since its debut in November 2022, ChatGPT has become the fastest growing technology platform ever, reaching 100 million users in less than two months. In response to the intense debate about how models like ChatGPT affect education, Wood decided to recruit as many professors as possible to see how the AI fared against actual undergraduate accounting students.
The idea to recruit his co-author exploded on social media: 327 co-authors from 186 educational institutions in 14 countries took part in the research, contributing 25,181 questions to a semester accounting test. They also recruited undergraduates at BYU (including Wood’s daughter, Jessica) to feed another 2,268 questions from the textbook test bank to ChatGPT. The questions covered Accounting Information Systems (AIS), auditing, financial accounting, management accounting and taxation, and varied in difficulty and type (true/false, multiple choice, short answer, etc.).
Although ChatGPT’s performance was impressive, students fared better. Students scored an overall average of 76.7%, compared to ChatGPT’s score of 47.4%. On 11.3% of the questions, ChatGPT scored above the average student, and performed particularly well on AIS and auditing. But the AI bot performed worse in tax, financial, and administrative assessments, perhaps because ChatGPT struggled with the calculations required for the latter type.
When it came to question type, ChatGPT did better on true/false questions (68.7% correct) and multiple choice questions (59.5%), but struggled with short-answer questions (between 28.7% and 39.1%). In general, it was difficult for ChatGPT to answer higher-ranking questions. In fact, ChatGPT may sometimes provide authoritative written descriptions of incorrect answers, or answer the same question in different ways.
“It’s not perfect; you wouldn’t use it for everything,” said Jessica Wood, who is currently a student at BYU. “Trying to learn just using ChatGPT is a tricky undertaking.”
The researchers also uncovered some other fascinating trends through the study, including:
- ChatGPT doesn’t always recognize when it’s doing the math and makes irrational errors like adding two numbers in a subtraction problem or dividing numbers incorrectly.
- ChatGPT often provides explanations for its answers, even if they are incorrect. Other times, ChatGPT’s descriptions are accurate, but will then proceed to select the wrong multiple choice answer.
- Sometimes ChatGPT makes up the facts. For example, when providing a reference, it creates a completely fabricated real reference. The work and sometimes the authors are not present.
However, the authors fully expect GPT-4 to improve significantly on the accounting questions posed in their study, and the issues mentioned above. What they find most promising is how a chatbot can help improve teaching and learning, including the ability to design and test assignments, or perhaps be used to craft parts of a project.
“It’s an opportunity to think about whether or not we’re teaching value-added information,” said study co-author and fellow BYU accounting professor Melissa Larson. “This is a disruption, and we need to assess where we go from here. Of course, I’ll still get TAs, but that will force us to use them in different ways.”