The Practice of Assessment

John Hannah

The Practice of Assessment

Knowledge… never arrives…it is always on the brink ~Lorri Neilsen

Prelude

Ok – so now that we have briefly interrogated the field of assessment as it relates to our work, let’s turn to the more practical – how do we DO assessment in Learning Strategy work? And how do we do it well?

It’s a form of inquiry, so it begins with questions. What do we actually want to know? That’s a prompt that will elicit a range of possibilities: What does Learning Strategy work do for students? By what mechanism do Learning Strategies work? What are Learning Strategies? And so on. But, beyond these fundamental inquiries are the ones more salient to the stated purpose of the work in order to answer whether that purpose is being served. In our case this leads to some variation of the following:

Does learning strategy work foster more effective, healthier approaches to study and learning for students, how and in what ways?

Most of our meaningful questions are unquantifiable and beg further questions (What does “more effective” mean? What does “healthier” mean? What are the benchmarks for “effectiveness”? – a kind of endless domino effect of subjectivity. This is not a catastrophic problem; scholarly inquiry naturally turns to these kinds of questions, by applying the necessary rigor of qualitative research methodologies. But typically, Learning Strategy departments do not have a research and assessment division, or budget, so projects that rightfully demand time, and fully developed efforts are done off the side of desks, pressured by short-term timelines, using instruments and methodologies unequal to the task. And, in the end, we often resign ourselves to what Rossi and Freeman (1993, p. 220) call “good enough assessment”, a kind of surrender to the limitations imposed by this context. We do not often have the time or luxury to insist on perfect sample sizes, or pristine date sets, or statistical modeling, so we salvage what we can from the data at hand and draw tentative conclusions from these cursory analyses. This “good enough” approach is in keeping with Student Affairs assessment more generally. The tail of these limitations, imposed by the co-curricular, non-research context then wags the dog of our inquiries. What gets asked are the kinds of things that are easy to answer. How many students came to your programs? How satisfied were they with the experience? Simple counting and likert-scale satisfaction surveys. This is not nothing. Regarding basic counts, it is important to know whether students are drawn to the programming on offer, to know what catches their interest, what the demographics are of attendees, who’s not attending, the attendance fluctuations according to dates and time etc. If the goal is to be as open to all students as possible, then these kinds of questions are important. Similarly, we want to ensure that students have a welcoming, positive, accessible experience when they attend our programming, so asking them something about their satisfaction is also important. As I said, it’s not nothing. But the assessment efforts are weighted in favour of those kinds of questions at the expense of the much more important and complex ones: what did the students learn, what was the educational impact of our work? It is to these questions, and all their offshoots, that we also need to be turning our assessment efforts. While the evidence for the value and effectiveness of learning strategy instruction is quite robust in the K-12 context, it is less so in postsecondary ones, so our assessment efforts can play an important role in the broader efforts for research and scholarship on the questions that interest us.

The “good enough” approach to this is a kind of qualitative assessment lite is something like this – distribute a survey, run focus groups, record and collect responses, subject them to some form of analysis, and “tell the story”. Again, not nothing, but it is not at all the same as rigorous social-science research, nor is it held to that standard. Assessment projects in these contexts are typically not required to undergo the rigors of a research ethics protocol which makes a distinction between research for knowledge creation, which requires an ethics protocol, and assessment for program evaluation purposes, which does not. So, there are significant methodological compromises made in the service of feasibility and any conclusions or insights drawn from these efforts need to be framed with words that suitably qualify them – preliminary, tentative, contingent, untested, inconclusive, etc. If one is to engage in “good enough” assessment, one must do so transparently and admit to these limitations.

What did students learn is a very difficult question to answer with any kind of precision and this becomes considerably more confounding when we add, what did students learn as a result of our intervention? We are fond, especially in a Student Affairs context, of using the language of “impacts” and “outcomes” because we want to be able to say with confidence that our work is the cause of positive impacts or desired outcomes for students. And these kinds of assertions are very often overly simplistic and premature. Our work as Learning Strategists involves “intervening” in the lives of students, stepping into their predicaments as they present them to us, and designing ways (conversations, workshops, resources, programs) to guide them towards a more effective approach to and relationship with their studies. So, those “interventions” become the relevant variable in assessing our work – we want to know the extent to which those interventions (the independent variables) cause improved approaches to study and learning (the dependent variables). As any social scientist will tell you, this is a notoriously difficult thing to demonstrate with absolute confidence even with the application of rigorous statistical analysis and efforts to fully isolate the variables in question. This is even more confounded by the “good enough” assessment designs commonly applied in our work, where such rigor is rarely applied, and by the self-selection bias of the students subject to our interventions. And there are other dependent variables of potential interest to us, or to the institutions that fund our work – retention rates, persistence rates, graduation rates, grade point averages etc. Again, while we remain interested in the ways in which our work intervenes in these things, and we may even design our work to have an effect upon them, it remains a significant challenge to reliably demonstrate correlations using typical “good enough” approaches to assessment. Are student participants in learning strategy interventions more likely to persist in their studies? More likely to graduate? More likely to have high grades? More likely to enjoy learning? These are interesting questions and, “good enough” assessment projects will go only part of the way to answering them. The rigors of full scholarly research efforts will go further. Assessment practices in a Student Affairs context, are not knowledge-building exercises from which we can make bold claims to truth, and we should not mistake them as such. Rather, the assessments we typically do here are simple, tenuous sketches that allow us to glimpse something compelling or tantalizing in the data we collect, and, more importantly for us, in the stories we hear from students. Traditional scholarly approaches to rigorous inquiry may separate researcher from the researched in an effort to be “neutral”, dispassionate investigators, but in our context, that requirement is softened. Our forms of inquiry are (hopefully) grounded in a spirit of risk-taking, tolerance for ambiguity, and the notion of interconnectedness, perhaps with greater attentiveness to and preference for the “context of discovery” over the context of certainty, causality and justification. We are (hopefully) more interested in engaging forms of inquiry that allow us to notice meaningful things in or about student lives and then design and redesign our supports for them in ways we think will be helpful. And then we do it again. And we rely on a “good-enough-assessment” approach, a methodology fit for this purpose. I say all of this not out of some sense of embarrassment with our approach but simply to offer some open acknowledgement of our inclinations, biases, and limitations.

And as a final thought on this lengthy caveat, there is, of course, a body of scholarship underlying or informing our work – deep and robust analyses on metacognition, self-regulated learning, the effect of motivation, the entangled causes of procrastination, the benefits of multimodal learning, effective strategies for memory and recall, the effect of learning strategy instruction on various educational outcomes, embodiment as a form of learning, the implications of attention and on and on, to say nothing of the deep wells of thinking from the psychology of learning, pedagogy, instructional design, neuroscience, etc. etc. Those efforts are critical, and we should be engaged in and informed by them. But here we’re talking about the practice of assessment as practiced by Learning Strategists and the departments in which they work, and it’s a different category of inquiry.

So, what kinds of things do we do when we engage in “good enough assessment” practices?

What do we want to know?

The practice of assessment begins with questions, but it is surprisingly common practice in higher-education to collect various bits of participant data even in the absence of much clarity about the purpose of such collection, or the questions being asked to which that data collection can be applied. This is wasted labour at best and irresponsible at worst. The collection of data should be preceded by thoughtful considerations about the intention, the purpose of the inquiry and what we hope to discover through that collection.

In the domain of our work, these questions are varied and interesting and will ultimately come from you in the context of your specific work. But, generally, our inquiries as Learning Strategists will hover around two central concerns (each of which beg a cascade of further considerations):

What kinds of learning skills and dispositions do students need to thrive in post-secondary education?
In what ways can learning strategy work contribute to the development of those skills and dispositions?

These two central questions will provide the most basic starting point in our subsequent and more specific assessment efforts and inform the ways we collect data.

A further refinement of our initial questioning exercise is to determine, more specifically, the area of concern for our assessment efforts. It may be useful to consider three broad categories of such in our work:

Scope – we want to know who participates in our programming, who does not

Satisfaction – we want to know if participants have a positive “user experience”

Impact – we want to know if our work has made an appreciable and positive difference

As noted earlier, it is the third category that is the most consequential but also the most neglected because it is difficult.

Ultimately, we will need to, in all cases, define the problem we are addressing as precisely as possible, defining the relevant population, the attributes of the relevant dataset, the value the inquiry will have to audiences both general and specific. Are we interested in building awareness among all students and campus colleagues? Are we interested in engaging with more specific audiences of students and their needs? Are we interested in creating change or action in students who engage with us?

And finally, we can ask questions about the relevant audience for our findings. Who are the beneficiaries of what we discover, and how do we best communicate with those audiences?

Upon those foundations, we can engage in the actual processes of assessment. There are abundant sources of reference to guide you in these practices, especially if you occupy a place in a Student Affairs ecosystem. Notably, Assessment in Student Affairs: A Guide for Practitioners by M. Lee Upcraft and John H. Schuh, or Evaluation: A systematic approach by Rossi and Freeman offer a full coverage of the topic. So, for our purposes, we will be brief, with a few practical suggestions for Learning Strategists who also engage, or want to engage, in assessment practices.

And for that, I have turned to Eugenia Tsao, the Assessment Specialist at the Centre for Learning Strategy Support in the University of Toronto who has provided much of the content for the following sections.

Methods and Data

The earlier caveats about the difference between research and program evaluation, and the adoption of “good enough” assessment practices in our work, highlight the narrow scope of typical assessment practice in this kind of “student-service” work. We traditionally rely on two primary methodologies: the survey and the focus group (and, to a lesser degree, interviews, case studies, ethnographies, and grounded theory).

Surveys are “quantitative” mainly in the sense that they’re useful for making the case for things and establishing baselines over time. Interviews and focus groups facilitate the probing of perceptions, motivations and rationals to generate more qualitative data about the richness of student experiences.

Surveys:

Following are some general guidelines for the deployment of surveys:

Best practice is to sample enough people that your worst-case number of respondents is still likely to result in 95% confidence that findings are accurate within plus/minus 4%. However, standard conversion rate is about 15%, so most student services surveys don’t achieve this (unless you really do have tens of thousands of users).

It’s best not to send too many surveys. To avoid survey fatigue, keep surveys short (but let your research needs determine the final length). Construct questions that are easy to answer, avoid double-barrelled questions, use plain language, and always offer an NA or neutral option, or respondents are more likely to quit midway through (indifference is a real attitude and shouldn’t be artificially pushed in any direction).

These are not all the same:

- Neutral: Option for those who don’t have a negative or positive feeling
- Not applicable: Doesn’t apply to that user
- Don’t know: Yet to form an opinion
- Prefer not to answer: If question is sensitive

Use a survey lede (start with a question that users are eager to answer). Put demographic questions at the end if you can: this is less feasible when branching logic is required, but better for conversion.

Likerts should have odd-numbered scales. Neutral midpoint is a key anchor. Avoid “reverse-scaled” questions / “attention check” questions as some sort of test for respondents. They annoy readers, lead to survey abandonment, and cause respondents to behave worse later in the survey.

When choosing demographic questions: don’t ask questions that don’t align with research purpose. Use institution-wide censuses when possible. If your user population is representative of your university’s student body, there’s no need to ask respondents for demographic information when it increases the likelihood of survey abandonment, especially when you can simply draw from institution-wide census data. It’s best to ensure that survey questions are linked with action: if you ask people their ethno-racial identity when they’re already asked about this in other pan-institutional surveys, tell them why you’re asking.

It’s best to limit the use of multiple-response questions, as they don’t assess intensity of preference or attitude (respondents can select every answer). Best practice is to change to a categorical question or, better yet, a Likert. If you use the multiple-response format, you will be forced to assign a unique code to every possible response combination and apply inferential statistics to generate a meaningful analysis. This would not yield any “measurement” (quantification) of intensity and respondents can also decline to answer — a lot of labour for little benefit.

Open-ended responses are time-consuming to code, but you should always still have at least one, so that respondents can caveat any earlier replies. Omitting this can lead to respondent annoyance or survey abandonment.

Types of questions to avoid:

- Loaded questions: “Where do you like to party?” vs. “When you have free time, how do you like to spend it?”
- Leading questions: “How satisfied are you with the service’s responsiveness, keeping in mind that we serve 70,000 students?” vs. “How satisfied are you with how long it takes to get an email response?”
- Absolute questions (always, all, every, ever, never in construction of question): “Do you always study at home?”
- Questions with acronyms, abbreviations and jargon. “Have you visited the SSC?” vs. “Have you visited our student services centre, near the campus bookstore?”

Aspects of a good survey introduction:

- Explains the purpose and importance of the survey
- Gives the contact information for the person responsible for the survey
- Indicates whether the respondent will be anonymous or their response is confidential (not the same thing)
- Gives an indication of the time commitment to complete survey
- Describes the incentive for participation (optional)
- Thanks the respondent for participating

Focus Groups:

Following are some general guidelines for facilitating focus groups:

Best practice for focus groups is to invite about 20% more participants than you need.

Conduct at least 3-4 groups on any given topic for an in-depth study. You’ll know you’ve conducted enough groups and have reached saturation when you’re no longer hearing new or startling insights.

Sessions that run between 40–90 minutes is optimal. After that, costs outweigh benefits for participants

Aim for between 10 and 15 questions. The goal is to foster free-flowing discussion that is prompted by but not constrained by the prewritten questions. Participants should influence other participants’ thinking and stances.

Why and how (open-ended) prompts are better than yes/no-style questions.

Similarity helps to maximize disclosure. For example, a focus group meant to prompt honest reflection and disclosure about the experience of aging should be facilitated by an older person. A focus group meant to prompt honest commentary about the experience of being a racialized person the Canadian job market should be facilitated by a racialized person. Power dynamics should be levelled to whatever extent is possible.

During the groups themselves, it’s best to do the following:

- Give participants a heads up that you may call on them to ensure that even quiet attendees have a chance to weigh in, and then do so.
- Emphasize that there are no right or wrong answers, as the desire is to hear a wide range of opinions.
- Ensure confidentiality and give assurance that what’s said during the session stays there.
- Consider recording the sessions to avoid missing anything, but assure the participants that they will remain anonymous in any public materials.
- The facilitator should encourage open, spontaneous commentary. The goal is to generate as many ideas, opinions, contributions as possible from as many different people in the room in the available time.
- Aim for representative feedback: try to hear from participants equally during the discussion.
- Acknowledge and use non-verbal communication: eye contact are body language are essential to building rapport.
- Clarify understanding of participants’ contributions, not by paraphrasing what you’ve heard but by asking for additional information and examples.
- Correct any factual errors or misinformed feedback at the end of the discussion, rather than putting participants on the defensive by correcting them in real time.

There are, of course, other possibilities for assessment data collection in our work besides surveys and focus groups (and the like), but these are primary. Towards improving our understanding, for example, of how students interact with our spaces, whether real and virtual—their “user experience”—as they navigate those spaces, we can apply other kinds of instruments, such as:

A tree test, which offers insight into whether names we’ve used to categorize our work, services, programs, or offerings are intelligible and distinguishable from one another. Results are quantified and include:

Direct and indirect success rates: How many users found a promising or suitable answer without having to go back up and down the tree? How many users landed on an adequate answer, but had to navigate back up and down the flow of the chart to find it?

First-touch data: Which categories did users start with? Initial clicks or access points are a good indicator of how sensibly phrased our category names are.

Static usability tests offer insight into how and why users navigate a particular resource when working toward a particular goal, as well as which design elements are used, avoided, or unnoticed. Results include:

Heatmaps based on where users click, which will help to reveal which elements are conflicting or create too much noise.

Task success rates and identification of elements that cause friction, i.e., points in the user journey where users get stuck before completing a key activity.

Types of data:

In the broadest sense, we rely on two types of data in our assessment practices: quantitative and qualitative.

Qualitative (descriptive)
- Can be nominal. This includes data with no natural order, including concepts or names, e.g., learning outcomes on a syllabus, value propositions in a strategic plan.
- Can be ordinal. This includes data that can be arranged in sequence or rank, e.g., attitudes or satisfaction scores represented as Likert scores, letter grades.

Quantitative (numerical)

- Can be continuous. Measured on a graduated scale, e.g., conversion rates or percentage of registrants who ultimately attended something, percentage of recipients who completed something, temperatures, heights, percentage grades.
- Can be discrete. Countable in integers, e.g., number of registrants, number of complete responses received.

You can analyze multivariate data by performing a multiple regression, factor analysis, or cluster analysis, but remember that statistical software packages are not a substitute for sound reasoning. The standard confidence interval we typically apply is a convention and finding a statistically significant correlation is low-quality by natural scientific standards even if high-quality by social scientific standards. For instance, no randomized controlled trial (RCT) has ever demonstrated the value of a parachute or umbrella, and there are even some rigorous RCTs that cheekily suggest they don’t work. Think: why might this be? In large part, it is because the norms and standards against which we measure “outcomes” are shaped by complex processes we may not be fully understanding or controlling for. In the natural sciences, these include physical conditions, laws of motion, mechanisms of action, kinematic relationships, and so on, whose influences can be completely described. In social analyses, these are (for example), political contexts, historical influences or socioeconomic determinants, whose influences on respondents’ perspectives, attitudes, inclinations, or frames of mind cannot be completely described at any given time. We may administer a standardized questionnaire to a student as a pre-assessment exercise, which will yield certain results like “low motivation” or “high executive functioning”, but these measures are always derived against some standard. But whose standard? And by whom was the finding made? Against some other standard, the student may have excellent motivation or low executive functioning.

In institutional research, the most common statistical techniques deployed to characterize small to medium sample sizes are typically basic measures of central tendency (e.g., interquartile range, or middle two quartiles: 25–75%) and measures of dispersion (range, standard deviation). Often (though not always), our assessment work involves small or unrepresentative respondent pools, and we temper our expectations about the confidence level and reliability or our conclusions accordingly. But you can still treat the dataset as a sociolinguistic corpus and analyze rigorously using the tools of sociolinguistics.

It’s easy to fall into the trap of imagining that systematic investigations necessarily yield reliable, complete, or representative results, simply because we’re studiously following methodologies we trained up on in school. However, there are several epistemological traps that studies commonly fall into, which we should always be cautious of, including: looking for one’s keys under the lamppost (treating findings that are straightforward to gather as the findings that are best suited to answering a question), perverse incentives (being swayed by competing motivations or not seeing how competing interests will tend to sway how participants respond to inquiries), “saving the appearances” (mistaking accurate prediction for accurate explanation, especially when predictive models are mathematically satisfying and complete, in contrast with real-world elucidations of people’s priorities, attitudes and tendencies that may be less mathematically satisfying or complete), and reification fallacy (believing that our ability to name abstract concepts means that those concepts have fixed, pan-cultural traits that all study stakeholders, whether researcher or participant, will hear, understand, and define in the same way).

Some basic forms of analysis:

Again, an obviously deep and complex topic on which we’ll offer only a few cursory considerations useful to the Learning Strategist.

We will typically use descriptive statistics where sensible: markers of frequency, central tendency, dispersion, variation, position. For survey data, averages and medians are often less helpful than net favourable percentages.

When analyzing qualitative data, you can use what market researchers conventionally call “deductive” or “inductive” approaches—in scare quotes since these uses of the terms do not accord with the ways you might know them from epistemology or the philosophy of science. With the former (“deduction”), you begin with a predefined set of thematic codes, then assign those codes to your dataset (e.g., satisfaction rubrics). With the latter (“induction”), you need to review your findings before identifying thematic codes. It’s a best practice to take several passes through your dataset—and for more than one brain or analytic tradition to do it, if feasible and where confidentiality permits—but if you don’t have that kind of staffing, perform at least three passes yourself on three different days to winnow down your codes, as you’ll bring a fresh perspective each time.

Transcribing a discussion is not a clerical task and it shouldn’t be outsourced. Your interviews and conversations are informed by what you brought to them, including relative levels of warmth, professionalism, body language, and other environmental preconditions and power dynamics that would’ve shaped the interactions. Someone else’s transcription is unlikely to capture what you know to be true of an interaction that you took part in and they did not. See, e.g., Mary Bucholtz (2000) on the politics of transcription.

When transcribing, you’ll nonetheless need to make value judgements, and to set aside your sociolinguistic hat when donning your strategic communications hat. Pruning so-called “filler” words and sounds (e.g., “uh,” “like,” “well”) can make a snippet of transcribed speech feel more eloquent, but removing all of it can remove meaningful information (e.g., hedges that are meant to precede inexactness, or are sometimes even meant to signal that what’s being verbalized is being verbalized with some degree of ambivalence). And yet, when featuring a quote as a testimonial in a newsletter or another storytelling vehicle, you would indeed typically pare all or most of that backchannel.

Transcribing takes time. As recently as ten years ago, you could anticipate a 10:1 ratio of transcribing time to recorded time, but that’s come down with the help of transcribing software. Even so, if you’ve made confidentiality or privacy assurances to your participants, it is best and most responsible to do the transcriptions yourself, rather than uploading a recording to proprietary applications whose servers may reside in a different country or beyond the bounds of institutional confidentiality commitments.

With your transcripts or any other narrative corpus, including open-text survey responses: identify common themes that come up repeatedly and group them as needed. These may be broad patterns of views, values, experiences, doubts, attitudes, and priorities.

When doing a basic thematic analysis, start with a first pass through the data to get some overall familiarity with it. Then revisit in a second pass and begin jotting thematic possibilities. Then begin generating and grouping your themes (e.g., using coloured labels or binary marks in an Excel file, whereby each attribute would take either a one or a zero in the record/interaction/event it’s associated with, so as to generate counts in a separate row from which you could build a chart or a tidier table).

Once you’re close to done, compare your themes against your dataset and ask yourself what’s missing—maybe nothing—and what changes might strengthen or tighten the way you represent these thematic patterns? It is very important to refine the language you land on to describe these themes. Your choice of phrasing is what ultimately stands in for your findings to any given reader. No one knows what you “actually” mean. They’ll just see a phrase like “Would value more mentorship for incoming international students.”

Be willing to adjust, aggregate, disaggregate, discard, and rework your codes as you revisit them, always with your audience in mind. Be accurate but put yourself in your readers’ shoes. If you’re reading this practical guidebook, you’re more likely to be working on an institutional research project than a research monograph where you can expound on your findings’ historiographic import or add poignant geopolitical context. What does “Would value more mentorship for incoming international students” sound like to an overburdened program coordinator? Your instinctive response to this question needn’t alter your approach—it arguably shouldn’t—but you should experience that reaction internally before you experience it externally.

Ethical Considerations

Assessment in our work typically does not require an institutional ethics approval process since, as mentioned earlier, there is a distinction between research for knowledge creation, which does require ethics approval, and assessment for program evaluation purposes, which does not. Still, the basic ethical principles that apply to institutional research practices should also operate in our assessment practices. And we should be guided by those core principles articulated by the Canadian Government’s Tri-Council Policy Statement on research ethics, namely, respect for persons, concern for welfare, and justice. In short, our efforts in conducting assessments of or about other people, should, above all and at all times, honour the inherent dignity of all human participants.

When conducting assessments, some questions that merit consideration include: Why choose this group of prospective participants, and not a broader or a narrower one? Is there a sound rationale for who is included in and excluded from participation in the research? Who will be exposed to the risks, and who will experience the possible benefits? What are the potential power imbalances between assessors and participants, and what are the implications of that? And, perhaps most importantly, have the assessment subjects given their consent to participate freely, and in an informed and ongoing way.

Reporting

In the end, the best data collection and analysis is only as good as how it gets meaningfully articulated and shared with relevant audiences. The following are some suggested practices when reporting:

General considerations: The word “report” is being used generically here. This can mean many things in our context – from formal reporting on a comprehensive, long-term assessment project to be shared with senior levels of management to ad-hoc one-pagers requested by specific audiences for specific purposes. And everything in between. In all cases, some sort of story is being told and should be informed by some basic questions:

What does your audience want or need to know?
What is the context in which these results should be interpreted?
What is the overarching message that your audience should take away?

These questions can be extended, especially for the ad-hoc report-request variety. It’s good practice to interview the report-requester. This can save time, since all projects take longer than anticipated. Ask the requester the following:

Has this project been conducted before? If so, do you have a copy so that this new report can mirror key aspects of the previous one? When do you need the report? What is driving the due date?
Is this a one-time request? Regular basis?
Who is the intended audience?
What are the top questions you want to answer?
Is this project confidential?

In most, if not all cases, an assessment report is meant to elicit some action on the part of its intended audience, and the information design of the report can help convey this. This doesn’t mean you always have to have a celebratory message, just an implication that involves a next step, even if that step is tiny, and, for this reason, it’s helpful to present results in a way that makes it easy to take corrective actions that align with those results.

Format considerations: “reports” vary considerably in scope and size depending on their context, of course. But, generally speaking, concision is desirable. Lengthier reports ask a lot of the reader, and central messages can get lost. Drawing from resources at Stanford University’s Institutional Research and Decision Support: Learning Assessment and Evaluation, a report at any scale should be organized around the following sub-topics:

The “learning outcomes” and quantified performance indicators (while being mindful of the perils of “learning outcomes” described in an earlier section): essentially, some baseline account of the desired results, the aspirations.

A brief, transparent statement of data sources and assessment methods, acknowledging the inherent limitations.

A brief statement on scoring.

The results: being sure to match as best as possible to the specified performance indicators (e.g., net favourable % from respondent pool for each defined learning outcome) while also accounting and making space for surprising results that lie outside the “learning outcome” confines.

A statement about insights drawn. What did you learn? What was startling?

A statement about what action, operational shifts, or new directions may be considered as a result of the findings.

And, as we know well, stories, more than mere data, can be the thing that moves people. It is also the element that can be guilty of wild conjecture, manipulation, and radical subjectivity when applied poorly, or irresponsibly. Direct participant quotes and testimonials can be uncontroversially transparent (depending on how they are contextualized) and add richness to numerical data. But, again, assessment work can naturally apply sophisticated qualitative data analysis in the traditions of narrative or discourse inquiry. In the end, it is the actual real experiences of, in our case, students, that has the greatest salience, and our reporting can be a very powerful tool to help meaningfully translate, interpret, and amplify those experiences. There is no reason that reports have to fall victim to that old complaint about “collecting dust” on someone’s shelf. Nor does the act of reporting have to be regarded as a pointless task when they include honest, authentic stories and real regard for the participants who tell them.

Visualization considerations: Good information design gives us faster access to actionable insights. When visualizing data, you are representing a relationship, which may be an exploration of something or an explanation of something. The main thing people care about when it comes to data is change. Data visualizations should be used to equip a specific audience and address their needs, so a few things to consider:

Are you trying to show how values compare? Use columns, overlay, line charts, tables.
Are you trying to show how values relate? Use scatter plots, heat maps.
Are you trying to show how data is distributed? Use histograms, scatter plots, and box and whisker plots.
Are you trying to show the composition of data? Use pie or donut charts (one attribute), area chart (over time), stacked bars or percentages (two attributes).
Be mindful of histograms. They are not bar charts. They show change using binned values—which are ultimately continuous and not discrete in nature.
Remember the limitations of pie charts and be mindful they are typically read in a clockwise direction from 12 o’clock noon. So, put your most important slice (often the largest one) first, starting from 12 o’clock.
Be aware of colour palettes and how they can strengthen or dampen perceptions of categorical, sequential, and divergent data.

Assessment for Learning

It is important to remember that assessment can be employed not only as a tool of learning but for learning. Engaging meaningfully with students on assessment projects can be a form of pedagogy, a powerful teaching moment in which, through the exercise of assessment, valuable insights about learning can be gleaned from participants. As an example of this, and to illustrate several key cautions outlined above, let’s look at the deployment of psychometric instruments.

There are many so-called “intake instruments” available to Learning Strategists that are used as tools to establish a “baseline” of a student’s current state of knowledge, awareness and use of study and learning skills or metacognitive strategies. These can be developed “in-house” as many learning skills departments have done, or practitioners can deploy more “standardised” instruments (often proprietary), which can leverage large-scale data sets. Many readers will be familiar with the popular LASSI (“Learning and Study Skills Inventory”), which has been commonly used for these purposes in our work.

These tools can be used as “pre-appointment” exercises but can be more helpful when used in real time with students during appointments to ignite critical discussion and unpack broader challenges, concerns, and strengths in conversation—in other words, as assessment for learning.

Other instruments of this genre include:

The Academic Functioning Questionnaire: https://psycnet.apa.org/doiLanding?doi=10.1037%2Ft20551-000
BRIEF Behavior Rating Inventory of Executive Function: https://www.parinc.com/Products/Pkey/23
MBTI personality inventory: https://www.myersbriggs.org/
Thriving Quotient: https://www.thrivingincollege.org/
Motivated Strategies for Learning Questionnaire (MSLQ): https://stelar.edc.org/instruments/motivated-strategies-learning-questionnaire-mslq
Academic Resilience Scale (ARS-30): https://ars-30.com/

Astute readers will have noticed that all psychometrics are arguably problematic, as they’re structured from specific perspectives to individualize—at times, even to pathologize—entirely normal responses to untoward sociopolitical conditions that harm and demoralize some while rewarding and celebrating others. Students’ responses to the ways these societal conditions manifest in their daily lives may appear suboptimal (or may be “assessed” as substandard or poorly conceived) but often have nontrivial strategic value that students can’t easily forgo, at least not without a comprehensive, empathetic conversation about why, for instance, they just can’t bring themselves to focus on a certain assignment or why they feel sure they need to take a certain number of courses.

The preceding is a smattering of immediately practical suggestions and their rationales for the Learning Strategist, not an expert assessment professional, to effectively collect, analyze and meaningfully report on the nature and impact of their work.

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

Starting Points: A Guide for Learning Strategists Copyright © 2025 by John Hannah is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.