Multimodality: A Must for Effective Human-AI Collaboration

When someone says â€œconversational AIâ€ to you, the challenge is figuring out what that means, especially since today, when people hear AI, they leap to generative AI thanks to the proliferation of tools like ChatGPT. AI will continue to build momentum as a tour de force for decades to come.Â

Within the realm of AI, conversational AI (CAI) can be something as simple as a plug-and-play chatbot on a website to more sophisticated virtual agents suitable for large enterprises. Enterprise-class CAI platforms aren’t as obvious as common chat is on a website and is not limited to basic text interactions. CAI enables a human to have a conversation with a digital entity (i.e., an enterprise) â€“ in a very human-like manner by using something called â€˜multimodality.’Â

We all want to have very Star Trek-like experiences interacting with these digital entities. That’s one of the reasons we all went out and bought things that reply to simple commands â€˜Hey Google’ or â€˜Alexa’ after all. The problem is that we’ve all been somewhat conditioned by scripted chatbots and call-center IVR decision trees to adapt to the interfaces of their systems and have been trained to â€˜push 1 for support’ or adapted to typing phrases or terms a bot can understand to get the answer you need.

This conditioning continues with nearly all Generative AI interfaces and permeates our expectations for how to interact with the AI. You are typing in a 1980’s style terminal interface to get an answer and to access the magic. Typing! Further, to get the most from these Generative AI platforms, humans are crafting prompts in a manner the system understands. The system â€“ not you.Â

See More: Balancing AI Bias with Ethical Data Collection

Multimodality Is Key to AI-human Dialogue

You, you darn human, shouldn’t need to adapt to any channel, mode, or language just to have a conversation with an enterprise’s digital human entity. Set your expectations higher! Let’s put the burden of understanding, empathizing, engaging, and reacting to your needs on the shoulders of CAI, and while we’re at it, do so in a highly ethical and transparent manner.Â Â

Multimodality isn’t so much a â€˜thing’ but rather a very sophisticated, highly tuned set of capabilities and AI disciplines that work together to enable a CAI to be more â€˜human’. It endows the CAI with the ability to communicate with and to understand the humans it’s trying to have a conversation with â€“ in the manner they choose â€“ across any channel or language. As human beings, we don’t think about â€˜modes’ when we’re talking with another human. Rather, we just have a conversation and don’t even think about how we’re relaying information to one another throughout that discussion. It just â€˜is’ â€“ and that’s what multimodality facilitates in digital entity-to-human conversations.

Specifically, the CAI understands the human user regardless of the mode or channel the human is using at any given time during a conversation to communicate. The CAI deciphers what you’re saying, how you’re saying it, and more to alter its responses to you accordingly and devise a plan to help you achieve your goal based on what it thinks you want to do next. And the more naturally it can have a conversation with you, the easier it is for the human to converse with the CAI and vice versa. After all, dialogue is a two-way street.Â

Consider the last time you had a virtual meeting with a colleague or a friendâ€”the part where you were planning something together. You spoke. They listened. And not only did they hear you â€“ they understood your language and reacted to your tone, emotion, and more. They parsed what you were saying â€˜on the fly’ and understood what you meant. And if they didn’t understand you, they asked questions to dig deeper to gain that understanding.Â

Another dimension to help interpret this conversation for both parties was the visual cues. They watched you. You watched them. And subconsciously, you each responded to and interpreted what each of you was doing while speaking to one another. Each could tell if the other was paying attention, your moods, and observed the use of gestures to underscore a moment or point at an object.Â

If you chose to Slack, SMS, or WhatsApp that person later in the day, you’d likely expect them to remember what you both discussed and continue your conversation. And, because the channel allowed it, and it helped you to communicate, you might attach some photos, audio files, links, etc., all leading up to trying to coordinate a future face-to-face meeting together. Perhaps the dialogue was like this:

â€œHey, let’s meet next week. I’d love to see you.â€
â€œGreat, what works best for you?â€
â€œOh you know, later in the week. How’s Thursday or Friday?â€
â€œSure, I can do that. Maybe Friday sometime?â€
â€œYeah ok, mornings are bad. How’s 1?â€
â€œGreat. See you at the coffee shop?â€
â€œSure.â€

Throughout this interchange, you reasoned with one another, and you were each able to explain what was said and why. Even coordinating the rendezvous required a complex, subconsciously interpreted conveyance of instructions, rules, and parameters that are simple for a human mind to decipher and yet, oh-so complex to be implemented within a CAI platform.Â

How Could Multimodality Improve Human-AI Interactions?

Consider what’s needed when filing a claim with an insurance company representative. As a human, you may just want to take some pictures or walk around your basement and point to things â€˜See this, it’s damaged.’ Or â€˜Look, here’s where the pipe broke.’ You’d expect a human agent to know what you were pointing at while you two had the conversation. Further, you’d expect the human to be fully trained, to know about your policy and the coverage you had, and to carefully guide you through the claims process to help you during your time of need.Â And you should expect no less from a virtual representative.Â

This is why multimodality and the ability of an embodied virtual assistant to collaborate and converse with a human in as natural a manner as possible is so important. You don’t have to stretch your imagination too far to imagine this scenario without CAI (please enter your account number, are you trying to pay your bill? Push â€˜4′ to go back to the main menu, etc.). We’ve all had bad experiences. But they don’t have to be that way with CAI.

Imagine better experiences by just being able to have a very natural feeling conversation with many of the companies you interact with regularly, both as a customer and employee. With CAI, enterprise digital entities and their personas can understand and engage with you anywhere or anytime without forcing you to adapt to it. Rather, you get to be you â€“ a human.Â

How far do you think we are from having natural conversations with AI? Share your thoughts with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

Multimodality: A Must for Effective Human-AI Collaboration

Multimodality Is Key to AI-human Dialogue

How Could Multimodality Improve Human-AI Interactions?

MORE ON CONVERSATIONAL AI

Contact ESSID Solutions

Reach out to us for a free consultation on big data consultancy and development services.