How XR Can Unleash Cognition

--

(And augment us in other ways)

I needed a thumbnail pic and cosmic brain wasn’t AR-enough.

Note: I’m using XR here to mean anything on the virtuality continuum, so AR, VR and everything in-between. I prefer to see the entire continuum as one unified concept.

Framing: Why is this important?

Our world is ever more saturated with technology. It is fundamentally powered by science, reason, and critical thinking. And yet, paradoxically, we also live in a world where “don’t make me think” is a popular meme in design circles. Carl Sagan once said:

“We live in a society absolutely dependent on science and technology and yet have cleverly arranged things so that almost no one understands science and technology. That’s a clear prescription for disaster.”

Powerful tech is pitched with such premises as “It’s Magic!” (looking at you, Magic Leap), “it just works”[1][2], and “the technology disappears

I’ll admit, I love a good UI. I love when it feels effortless, like I’m one with the machine. And that on its own isn’t a problem — but there’s a danger to taking that to the extreme and creating a world where everything we rely on is incomprehensible ‘magic’. Technology that ‘disappears’ becomes difficult if not impossible to understand, and everything ‘just works’… right up until it doesn’t. Then what?

Describing Uber like a mystic deity is probably not a good thing.

Learning and mastery drive everything about modern life. But they have innate friction. Every time you have to figure out a problem, or practice a skill, your brain fires up a bunch of processing power, called ‘cognitive load’. This takes work, and requires defeating inertia. Your brain functions on a ‘use it or lose it’ basis, so this friction is directly responsible for growing your mental model. This is friction that many designers are constantly trying to ‘optimize’ out of existence. That time you forgot a hotkey and had to rummage through your mind to find it? That strengthened your internalization of how to perform the action - deepened the pathways. It also contributed a small amount to your typing speed and motor control, as well as your emotional control for handling failure. All this adds up over time.

There’s more evidence that hyper-optimization is causing deleterious effects. For those who use GPS to drive everywhere, and/or spend a long time in open-world games with map markers and lots of navigation aid, it might be that under-utilization of spatial navigation skills will result in brain atrophy. (This is a pretty strong claim so if you have counter evidence please post it as a response)

Aside from over-optimization, there’s another harmful trend: obscurity. Ever notice that things are getting harder to repair? The once modular computer is ever more a fully integrated device. Bootloaders more frequently locked, specs harder to find.

What’s the big deal though? Tech is just a bunch of gadgets and tools, right? No! Personal tech is more than that — an extension of the self. And as we go into the future, sometimes a literally integrated component. People use their smartphones as a primitive exocortex, memory prosthetic, and navigation aid. Many won’t even leave the house without it, and when they do they report feeling as though a part of themselves is missing. On top of this — XR enables spatial computing, where we project ourselves into the world. Not solving this could be disastrous.

Now with that framing out of the way, here is my main assertion:

A lack of critical thinking, encouraged by “Don’t make me think” design, exacerbated by obfuscation and lock-down, is pushing us towards a world where no-one thinks. XR can solve this by embedding natural, playful education, understanding, and skill-building into the world itself.

How can this work? Let’s go over some things that can be improved by always-on XR.

Improving Discovery

First, we can improve the discoverability of concepts. Imagine a system where, everywhere you go, almost every object has an un-obtrusive icon floating nearby. The icon is a signifier of more information. When you pay enough attention to the icon, it expands, explaining what the item is, with a simple list of components. If you look away for long enough, it collapses back into its un-obtrusive state. Once expanded, the user can then use an interaction to query more and more data. The idea is to allow the user to, at their own pace, interrogate the system about the item or concept in question. Taken to its logical extreme, we’d need to retrofit or fork some massive, Wikipedia-scale database and start adding geographical correlations to anything that can exist in meatspace. Either way, having relevant data as close to the source as possible accomplishes a Natural Mapping, which should reduce friction to an action that we consider virtuous (learning). We’ll be intentionally invoking more friction later so reducing it here isn’t an issue.

This idea of providing easy discovery points in the world is pretty much AR 101 though, and it really only helps discovery— so we’ll have to expand upon this. Which leads us into…

Improving Comprehension

One cold, hard fact about the XR tech that we’re going to have to come to terms with, is that it is fundamentally a cybernetic technology. It has to meet human needs and be adaptive enough to accommodate the vast differences from person to person. For comprehension, I’d like to bring up the differences in how individuals understand and build conceptual models of the world. In contrast, the way we teach things tends to be single-modal. The student is either processing text and still images, a human speaking with body language, or a video. But even if that material is in the mode that the student comprehends best, how difficult the material is could be way above or below the person’s level. In practice, it’s likely that the type and difficulty dimensions aren’t ideal.

This means that our systems must be capable of explaining a single concept from a variety of different perspectives. Some people are fine reading about a topic. Others need a visualization. Some need to hear it spoken. Others need to touch. Or the ideal solution could involve fusing multiple approaches at once, or seamlessly switching between them.

CalcFlow does some of this for different kinds of math.

XR lets you do this, almost by nature. Visualization and animation of 3d concepts, as well as physics & audio simulations are things XR does very well.

Let’s consider a non-math example— say, an augmented oven with a stovetop. An XR stovetop might have buttons with icons by default, but when you inspect them closely you get text based tool tips — simple names with fly-outs for deep mechanistic explanations.

When inspecting a button, it would highlight the circuit, showing how electricity flows to the element.

When the button is pressed, the circuit glows with an intensity that maps to the current going through it, which will give exact numbers if inspected closely. You could watch heat build up in the element, again with exact numbers available. We could also provide audio feedback: spatially tracked emitters providing a hum that indicates where and how intense heat distribution is. Not just on the element but on the pan and food as well. Moving parts like fans, either in the fume hood or in the oven will get drawn in place as, animating at the exact speed of their real world counterparts.

Make all the things see through! Also seriously HMD manufacturers, thermal vision!

All of this will expose the user to previously hidden rules — the properties of materials that dictate electron flow, how heat accumulates in and alters different materials, how circuits are set up, how mechanical linkages interact, etc. This will eventually result in intuition for things like how food placement inside of a pan influences the final result, how resistance creates heat, or how friction wears things down. This insight being packed into casual, everyday life means that previously insurmountable problems like a faulty component can be diagnosed right away because the black box is now see-through. You might notice that something isn’t performing the way it’s supposed to long before a critical failure happens, and when it does, there’s a chance you’ll know what went wrong just by looking. No need to rely on expensive professionals when you can just order a part or do a patch yourself to get things working again. The best part is that you’ll get to notice similarities across systems — so new skills can generalize. You’ve augmented your user into an even more powerful always-on learning machine.

To be fair, we are getting better at education— different teachers create explanations from their unique perspective, and the best teachers can construct them from multiple perspectives using various media. But those people are scarce, and are constrained to human scale. Furthermore, since they’re not a fully immersive simulation, they don’t have total control over the student’s sensory inputs and can’t feasibly construct an immersive experience, let alone the kind of real-time, infinitely mutable one that XR tech can. XR combined with some kind of machine learning and/or other data-driven or schema-driven framework could automate this process.

Also, since this makes seeing the world as a set of layers and different perspectives the default, it has the interesting side effect of fostering skepticism and rationality. If you’re interrogating the world in a system like this, you’ll eventually hit a part of the database with multiple theories which you must compare side by side!

Improving retention & defeating inertia via reinforcement and virtuous friction

So for anyone to retain their skills, they need to take advantage of reinforcement learning. Basically, this is how we learn through repetition. The more often you do something, the deeper neural pathways involved in the action become. All the effort put into exposing people to the mechanics of the world will be worthless if they never engage with the material. We need to get people to *want* to train themselves. To do this, we need to make the act of learning rewarding.

Rewards
There are two kinds of rewards we can use, intrinsic and extrinsic (the former being the superior option). The first approach we’ll use is a concept called ‘juice’. Let’s say we’ve got a simple sorting task, where a user has a series of items that have to be put in appropriate boxes. When the user makes a successful placement, they should get rich and satisfying feedback, like a hard bass hit and a graphical flourish. Maybe even distortion waves that emanate from the object. This can be scaled to provide feedback to how accurate they were (Like the arrows in Dance Dance Revolution).

Another way to make an action intrinsically rewarding is to base it off of something that is innately pleasant, like the flowing kinesthetic motions of dance. This will mesh well with gestural inputs. One trick to making this feel good is to be quite generous in terms of the activation conditions for gestural input (let your user be lazy), but when showing them how to perform the input, use a more flowy and fun motion. Your user won’t always be in the mood to dance, but reminding them that using a computer is supposed to be fun will go a long way. Also, juice and kinesthetic rewards can feed into each other — perhaps the activation feedback for a gestural command can have livelier sounds and brighter particle effects the more speed and grace the user applies!

An example of some ‘juicy’ objects. Adding juice makes the normally boring act of slapping shapes fun!

The important thing is that the actions of the user need to feel good to perform. Game designers call this “Game Feel

We do want to be careful not to over-do it though. Distracting from goals is not helpful. Also, there’s also only so many particle effects and impact noises you can add before it gets obnoxious. Less is more, and little details make all the difference!

The game designers among you have no doubt figured out that we’re turning everything into a game about learning and mastery. The game elements are what aid in reinforcement learning: if your application is fun, users are going to do it repeatedly, providing the time needed to cement their mastery.

The game designers are also already lamenting the fact that while rewards should ideally be the juicy intrinsic kind, it can be extremely difficult (and expensive) to make everything like that. Extrinsic rewards can be a bit more ethically dubious, but they are easier to create, use and balance. They usually come in the form of classic progression systems; XP, levels, unlocks, achievements — that kind of thing. Don’t use an extrinsic reward unless you absolutely cannot come up with an intrinsic one! This is the mistake that most gamified systems make, and it’s why they have a reputation for being manipulative. Yes, a skinner box will get your users to come back repeatedly, but they won’t be getting anything out of it.

A good candidate for extrinsic rewards would be to provide a progression marker for how many times a user looks at something and asks questions like “whats this?” or “Why did that happen?”. You could also have an achievement for reading the wikipedia article associated with a certain number of tagged real-world objects.

For a more detailed example, let’s go back to the stove from earlier. To skill up our user, we could provide simple goals, escalating in difficulty, for each meal. They could be goals like ‘get x% of noodles in the pot to reach a softness of y’, or ‘make this dish which has more steps than the last one you tried’. A nice feature of cooking is that when it’s done, you can pretty much bet on the user sitting down and eating immediately afterwards. Most people do something passive while eating like watching TV — this time could be used to document how well the goals were hit, show them how the complexity of their meals has increased over time, and display historical data about what they’ve learned in general.

Virtuous Friction
Now that we’ve made our interactions worth returning to, we need to keep them that way. Remember early in the Discovery section when I mentioned that we’d be re-introducing friction after cutting some of it down? This is where we’ll be doing that.

One of the best examples of virtuous friction is difficulty scaling.

Let’s look at how this could be applied to workflow. Say we’ve got a 3d programming system reminiscent of Blueprints. To provide the user with a motor control workout, we can take the task of connecting nodes and make it slightly harder each time the user successfully connects nodes. We’ll make the connection points huge at first and scale them down over time. If we wanted to add hand/eye coordination training, perhaps we could have the code boxes stay perfectly stationary at first, but later on add small micro-motions when the user goes to make connections, requiring them to predict the position of the target and place connections with a bit more skill. Use the aforementioned juice to make mistakes less disheartening.

Whenever working with difficulty scaling, we can plot the friction-over-time on a curve commonly called the difficulty curve. This is important because it lets us conceptualize an artifact of human psychology: the Flow state.
The flow state is a critical part of learning and mastery. When attempting to accomplish a goal, the flow state initiates when the amount of challenge and interest in performing said challenge begins to align perfectly with the player’s skill. Everything else begins to fade away, and the player gets ‘in the zone’.

The blue curve represents the difficulty level. If the task becomes too difficult, the player gives up. Too easy and they disengage. Friction and rewards are the tools used to bend the curve.

More Use Cases

So we’ve gone over some ways always on XR applications can improve our lives and selves but maybe the stove and visual scripting examples aren’t convincing you. Let’s explore more use cases. Most aren’t feasible to build today, but are worth thinking about now.

The Failure Debugger: Because optical tracking systems are ubiquitous in XR systems, it is possible to store a history of everything the user does, given enough storage space. If you get stuck on a task like assembling a statue out of lego blocks, you could reference an action history much like the call stack in a code debugger — go back through the steps, one by one, until you find where things went wrong. This can vary in sophistication — it’s easier to just allow a user to scrub their video feed back and forth through time and manually figure this stuff out, but it’s much more helpful to have software automatically analyze the feed and turn it into a step-by-step breakdown with navigational bookmarks. The latter is much, much harder from a technical standpoint. The first approach is necessarily general-purpose, while the second requires specialized treatment per-use-case.

Philosopher’s Debugger: This one relates well to the main theme of critical thinking. When processing a topic, it is helpful to keep track of what kind of logic is being used, what arguments support which assertions, etc. Even more so when debating with an interlocutor. It’s easy to forget what was said and by whom. To solve this, I propose an XR program called the Philosopher’s Debugger. As you debate, it uses a microphone in the HMD to record what was said and who said it, providing a conversation log that can be scrolled through. It can draw connections between arguments and counter arguments, as well as highlight common logical failures like strawman arguments, ad-hominem, false dichotomy, loaded questions, etc. Productive thought is non-linear in nature, but verbal communication is synchronous and real-time. This would be the best of both.

Propaganda & Photoshop ‘spidey sense’: Forensic imaging techniques can detect when images and videos have been manipulated. Combine this with haptic wearables like Exoskin used in a sensory substitution fashion, and you can provide your user with the ability to ‘feel’ when they are being exposed to things meant to manipulate them. BCI would be helpful here as well — if your user looks at an image and you get an immediate fearful or indignant reading from them, that’s a strong indicator of manipulation.

Memory prosthetic:
This one could have gotten a category under ‘improving memory’. Like the failure debugger, being able to record a user’s first person experience is the foundation. You could store everything a person does, then make that queryable. A user could ask ‘where did I leave my <item>?’, and the system can use the records to find the most likely place, and draw a path to that location. (Introduce a margin of error if you want to give the user a challenge) If you’ve got BCI this could be even more seamless by having the user mentally visualize the item they wish to find, matching the brainwaves to those recorded and associated with the items they own. You could use this same technique to correct for ‘tip of tongue’ state.

In addition to finding things, you could store entire memories, and let the user re-live them from various perspectives. This could help prevent the degradation of memories over time, as a digital recording will be more immune to decay than a neurologically encoded one.

Another memory prosthetic application is remembering people’s names and relationships. If this is done entirely offline, it would have the added benefit of putting identity back into the hands of individuals. If someone gives you an alias, and that’s all you and your personal XR equipment experience, you’ll only ever know them by the parameters *they* define.

Down-time enforcer and sensory overload protection:
Much like we can facilitate active improvement, we can also encourage disconnecting when necessary. One application you hear lots of people talking about is AR ad-blocking. That idea can be generalized much further. Everything from censoring their work tools if they’re crunching too much to drawing a stressed person’s attention to a gentle, relaxing experience is possible.

Unsolved Problems

Lots and lots of hardware problems

  • To run background processes like “when is this user being OK with interrupted for quizzes and exercises?”, and enforce user beneficial things such as “ensure users get enough mental recharging time” we’ll need primitive BCI. Maybe something like Neurable is up to the task?
    BCI isn’t needed for difficulty management but it certainly helps.
  • For gathering user intent and giving them very precise inputs, we need eye tracking.
  • High fidelity hand tracking exists but adoption needs to be ubiquitous. In general, we need way better body and brain tracking to even begin to accomplish this kind of system
  • Powerful wearable computers that can work anywhere and be entirely in the hands of users instead of having their augments taken away because they miss a payment or a cloud service shuts down. Ideally, something with the compute of a modern gaming laptop would be made effortlessly wearable for days or weeks at a time. Advancements in materials and design will be necessary.
  • For AR devices, the ability to render shading. Physical UI controls are often more natural than flat UI, but without proper shading, users will struggle to understand the shape and location of virtual objects.

Design Problems

  • The vastness of the design space. Every room is different, no shape or volume is guaranteed to be available and worse, environments change!
  • The vastness of the user space. Every person is unique. Morphology and cognition are not even remotely uniform across the population.

I think that these problems will ideally be solved by procedural, generative design. If you create a workspace at a fixed height based off the average height, you get something that works for most, but works well for no one. If you create a workspace at the height of the user‘s abdomen, it will work for everyone. If you build a virtual environment for a 1.4m x 2m space, it works in most rooms. If you build it where item placement is a function of surface area and surface normal, where boundaries can be arbitrary, and agents navigate instead of following fixed paths, your virtual environment can go almost anywhere.

The biggest unsolved problem is more philosophical, not technical — goal alignment. How do we really make sure we’re building this for our users? User needs are as diverse as humanity, which really is quite difficult to truly fathom. Also, as designers, we have a degree of power over the user. We must resist incentives to choose our values over the user’s. Conventional design wisdom says ‘design for the 99%’ but just like ‘don’t make me think’, it fails in this context because of how intimate this technology is. If this technology isn’t constantly trying to align itself with the user, it will be ineffective, useless, or in the worst case, harmful.

What about you? If your world could automatically teach you its secrets, what would you want to learn? What properties of the universe would you turn into a game? What skills would you level up?

Let me know in the responses!

--

--

VR industry veteran, HCI expert, interaction & UX designer. Transhumanist, nonbinary. Goth. Friend of sentient machines. They/them or she/her