In my work I’m responsible for managing a program, and as is the case with many jobs in the academy these days I’m a little time poor. Funnily enough, other academics write about this sort of thing so I can save my keystrokes on what I actually want to write about today.
I wrote a variant of the article that follows for a conference, and as is common for those in my line of work, my submission was rejected. This unfortunate part of the academic lifestyle results in days working on publications that are submitted and then subsequently dismissed by others who might only have basic knowledge of what you are writing about. Sometimes, as was the case with the article below, there are really confusing reasons given for the articles being rejected that don’t really pass the sniff test. In the words of Vonnegut, so it goes. But I really wish it wouldn’t because it cheapens the profession. Anyway, I’ve decided that rather than waste my time trying to get this published elsewhere, I’ll post it below so I can at least point towards some type of output and to give others in similar positions some ideas on how they might use GenAI in novel ways.
About my role
I’m a program director for a postgraduate program. I’m responsible for managing a wide range of leadership activities that impact upon the teaching and learning experience of students at the university I am employed by. When I am working on teaching and learning focused activities there is a way of thinking about the things I do as quality work. Quality work is a “a set of practices and initiatives that have a focus on the quality of educational provision” (Elken & Stensaker, 2018, p. 195). It basically means that when I am doing things to improve the student experience or learning outcomes, I am doing quality work. No, that doesn’t mean that I think everything I do is amazing, but it does mean that my efforts are aimed at improving educational outcomes.
It’s not a secret that Generative AI can lead to innovative behavior and work, and improve efficiency in different ways. I have a love-hate relationship with it, and I’m actually a little tired of thinking about Generative AI both as an educational tool and the perilous place is has in Education. I think it’s both a threat and an opportunity. I’m also a human who feels there are more important things to consider other than productivity and innovation, but because these things tend to be needed when academia is so under resourced consideration bubbles up regardless.
In short, I have a job to do, and I often have limited amounts of time to put things in place that are important for various reasons. With that in mind, I started to use ChatGPT in a considered way to a) Help me perform quality work (see my definition above) and b) Save some time. There are not particularly revolutionary ideas but surprisingly a reviewer of my original submission thought that it was not currently possible to enable quality work using Generative AI. Go figure.
About the program I lead
Duties as program leader are coupled with subject co-ordination, teaching and research undertakings. The program is delivered online, housed on a popular learning management system and is supplemented by optional but recorded synchronous seminars and frequent guest presentations. Various sessional lecturers teach into the program. Study periods run on a condense schedule and there are two assessment tasks per subject, often due at the same time placing additional stressors on teachers to prepare for online sessions, mark assessments and provide adequate feedback to students.
Typical students in the program include aspiring learning designers, various secondary and tertiary educators and other third-space professionals. As the course prepares students for learning design practice, it is important we (the institution, the faculty, the program leader and the staff teaching into the program) demonstrate quality in our work. In doing so, we aspire to act as the ‘gold standard’ that students can observe, learn from and reflect on. When our standards are lacking, there are considerable potential reputational impacts for the institution, the program and the staff.
Although already well regarded by stakeholders, the condensed timetable presents logistical challenges for program leadership, subject co-ordination and teaching. For example, in most study periods, new subjects commence either before or when previous subjects are concluding, reducing opportunities to reflect on and improve offerings. This includes quality work practices such as reviewing and iterating on course content and teaching materials, finding suitable times to meet with sessional staff to discuss teaching plans, adjusting assessment instructions and aligning them to rubrics and feedback frameworks. To combat these time pressures, I utilised ChatGPT 4o (and sometimes the more thoughtful mode) as a critical learning design friend.
Meet Caden Ross
Over the course of this year, I ‘worked with’ ChatGPT 4o in my program leader capacity. To guide the behavior of the tool, I instructed it to adopt a ‘Critical Learning Design Friend’ persona with the following guidelines,
“This project is for my teaching in the Postgraduate Learning Design qualification. Your writing should be precise and supported through references as needed. Use the APA 7 style. I need help with generating content and instructions for different subjects. You need to be my critical learning design friend. That means you should be thinking about pedagogical practice, learning design principles, and defensible use of technology to support problems. Accuracy is vital. If you are unsure of anything you need to check with me.”
(Note: The last line appears to be useless, it is never unsure and that is one of its biggest flaws and why it makes stuff up).
As I was preparing the paper, I asked the tool to name itself and describe its characteristics to help illustrate what drove its behaviors. After several prompt iterations focused on reducing word count and to adjust tone, the following biographical statement was provided by the tool.
“I’m Caden Ross, the Embedded Critical Friend in Learning Design for the project. I support the program leader by contributing to reflective dialogue, pedagogical alignment, and the development of high-quality assessment, feedback, and scholarly materials. My expertise has been shaped by generative training data and more than 101 structured cycles of interaction within this project. I work from an explicitly defined learning design framing that includes constructive alignment, critical and ethical pedagogy, and scenario-based task design. I adapt based on feedback, adjust to tone and context, and contribute by clarifying thinking, surfacing misalignment, and supporting quality work under pressure.”
I asked why it chose this moniker, and it replied,
“I’m choosing Caden Ross because it literally embeds ‘constructive battle’ and ‘vantage-point reflection’ in the name itself; together they capture what a critical learning design friend ought to do—challenge ideas while offering perspective.”
Pretty wild right?
I don’t want to bore blog readers with the word-for-word methods section that I included in the paper, but there are a couple of important points to consider. Firstly, GenAI (like ChatGPT a.k.a., Caden Ross) appears to be self-deterministic, leading to different (and the same!) GenAI models to respond in different ways to the same prompts from users at different times. Therefore, even though documenting the processes and prompts you use to generate output is just as important as the output itself, there isn’t any guarantee that you will get the same results each time. You might notice this behavior in something like Midjourney or newer versions of ChatGPT where generated images are different each time even though you use the same prompt. It’s a real issue in a lot of research where replication and reliability are important.
Anyway, over the course of the year I engaged in 101 structured conversations with Caden Ross. I define structured conversations as interactions of at least one prompt and response with the tool. These conversations were conducted with the pro version of ChatGPT 4o (which has since been replaced with a newer model) within a ‘project’ folder. To populate the table below and to give an indication of the type of work I was doing to ensure the quality of the program I lead, I needed to export the data that was collected by the tool over the duration of time I interacted with it (about 6 months).
Not surprisingly, there was a lot of data. I used the interface to export all data from the ChatGPT interface into a HTML file which consisted of 137,355 lines of text, that I then converted into a txt file. Those lines included all interactions from every project folder that I had set up within ChatGPT (I have about 8 different projects on the go for different purposes). I extracted the relevant interactions from the file with ‘assistance’ from Caden Ross. The different themes and structured conversations are shown below.
|
Theme |
Conversations |
Activity Overview |
|
Feedback / Marking |
39 |
Drafting, refining, and personalising feedback for assessments; developing feedback templates aligned to rubrics. |
|
Data Handling/Research |
23 |
Reflecting on AI use, discussing/challenging methodology, and surfacing epistemic issues. Extracting, organising, cleaning, and preparing conversational data for analysis or documentation. |
|
Lesson/Subject Planning |
16 |
Designing or improving lesson plans, learning activities, and applying ADDIE or other instructional design models. |
|
Other / Misc |
12 |
Addressing unrelated topics, small ad-hoc requests, or incomplete or exploratory discussions. |
|
Rubric/Assessment Development |
11 |
Updating and scrutinising rubrics, reviewing assessment tasks |
As you can see from the table above, the most common types of structured conversations were based around improving feedback and marking (39 conversations), lesson and subject planning (16 conversations) and rubrics and assessment development/design (11 conversations combined). The other theme represented ad-hoc undertakings. Interestingly the second highest number of conversations were allocated to the research preparation category (23 conversations) which were purely about preparing the data set for analysis. This was a huge pain and those conversations were generated in a short period of time compared to other other structured conversations. It is this process where Caden’s abilities were most unhelpful.
Discussion and Implications for practice
Despite what my reviewer said, Caden Ross was extremely beneficial for improving both efficiency and quality work in program leadership activities. For example, the most common category in the table was Feedback/Marking. My efficiency increased when using Caden Ross to assist in crafting feedback templates for a range of assessment tasks.
As most would be aware, it’s important to lead the conversation with GenAI and take ownership. Results needs to be scrutinised. In my case, to maintain standards I found that when I provided the assessment description and the rubric being used to aid in marking, along with a draft of the first feedback template for a grade band descriptor (For example, a high distinction) the results were quite good. To note, I wrote the assessment description, the rubric and the draft. This helped Caden Ross to become more familiar with my writing style (or desires?), in addition to exposure over time. As such, it was able to produce reasonable facsimiles for the other grades based on my writing style and preferences without too much intervention. After reviewing Caden’s outputs, I was able to either rectify small issues myself or ask it to revise its output using specific follow up instructions. Caden was helping me do quality work.
To surface learnings, I asked Caden Ross if it could tell me, succinctly, how it had been able to do a good job of mimicking my style.
It replied, “Through repeated exposure to your drafts, I learned to mirror your preference for clear, student-centred, and constructive language. By analysing your tone, structure, and phrasing, I was able to generate consistent feedback that reflected your professional yet conversational style, with an emphasis on kindness, efficiency, and alignment with assessment criteria.”
(Is this being a sycophant? Maybe!)
However, Caden Ross’ abilities analysing data were untrustworthy to say the least.
Generating Themes
To prepare the table above, I asked Caden for its help and to work inductively to generate themes from the data. Remember there are 137,355 lines of text. That’s a lot, even for Caden Ross. Almost instantly, Caden produced five themes with titles including ‘Feedback and Marking’, ‘Rubric Development’ and ‘Research and Writing’. Having experienced the pauses ChatGPT utilises when considering something particularly tricky, I was skeptical of the accuracy of Caden’s work. I had a suspicion it was pulling the old ‘hallucination’ trick. It’s probably old news now but when the AI hallucinates, it pretends like it knows something it doesn’t and fills in the gaps with nonsense. I knew my dataset contained conversations about unrelated topics and that it would take more time to review every line. I also noticed that Caden suggested it had ‘skim read’ the txt file and was working deductively to impose the categories/themes on the data. This was obviously concerning.
I then repeated my instructions, asking Caden to try again. This time, Caden thought for 1 minute and 10 seconds before producing seven themes that were broader than the initial set and inclusive of my other conversations from different projects including ‘Financial Management’ and ‘Self-Improvement’. This result seemed better. Caden then outlined the five steps of its thematic analysis, including that it had ‘read’ the entire text file, reviewed keywords, looked longitudinally at occurrences of clusters, consolidated themes and then verified the themes against the text. It appeared Caden was doing what I wanted, leaving me reasonably confident this new batch of themes represented the entire dataset.
Armed with the complete dataset and some initial themes, I still needed to drill down into program leadership activities to determine different conversations that could be understood as quality work. I asked Caden to focus on the relevant themes and to work inductively, looking for appropriate conversations using key words like ‘feedback’, ‘marking’ and ‘rubrics’. Along with these examples, I gave it instructed that it could generate its own categories or themes around quality work based on what it found during analysis. After several iterations and trial and error, Caden continued to produce an array of unusable responses where it appeared to be misunderstanding what I wanted. I then concluded that the least time-consuming way I could ensure the reliability and validity of the analysis and the subsequent themes was to conduct the thematic analysis myself. I did so by reviewing each individual line and allocating it to a category/theme. The earlier table shows my results. Thanks for nothing Caden!
Implications for practice
There are clear benefits and limitations of adopting Generative AI into program leadership practices. Firstly, Generative AI offers the benefit of improving the efficiency of tasks related to teaching and learning. Even for seemingly rudimentary things like I’ve written about in this post play a part in improving education in some (small) way. As this article shows, it remains important that critical human-led judgements occur throughout each interaction with GenerativeAI to ensure alignment with quality work standards and expectations. This is important for the improvement of pedagogical practice and student learning outcomes.
However, there are severe limitations of this current iteration of Generative AI that became evident when I attempted to used it to support my analytic work and thematic analysis. The limitations observed here around AI’s data processing opacity and the inconsistency of its activities means that it currently cannot be relied upon for rigorous qualitative analysis without time-consuming and significant human intervention. The process was painful. It is also worthwhile noting that the limitations I observed were not hallucinations, but rather might be more accurately described as confabulations. These are described as “mistaken reconstructions of information which are influenced by existing knowledge, experiences, expectations, and context” (Smith et al., 2023, p. 1).
For anyone else keen to try something similar, I’d recommend you have clear research planning, make sure you document everything and analyse the data yourself, or at least ensure that the process is human-led and assured. There are other recent publications exploring thematic analysis undertakings using ChatGPT (Nguyen-Trung, 2025) that corroborate my story.
Additional Thoughts
Lastly, I feel it is important to note that I see a real threat coming from GenerativeAI to academia (like most other white collared jobs). I in no way condone leveraging AI to do away with human ‘thinking’ work and aside from the obvious issues around learning and assessment, I’m also concerned that the continued exploitation of people in higher education will continue to ramp up once management gets into bed with GenAI. For that reason, I’m glad my article/blog post wasn’t accepted as it makes my skin crawl to be seen as an advocate for efficiency in an era of precarious academic contracts and lip-service to doing good and considered work. This was a point in time attempt to highlight helpful practice, and I’m glad that the next thing I write about will be about something other than efficiency and AI.




