Sunday, October 14, 2018

One last purely theoretical think piece on our forthcoming radically new interfaces

The pieces are in place

Imagine a time just before speech -- when people had the physical ability to produce the same phonemes we use today, but before they had developed the practice of assigning definite meanings to sequences of phonemes. Maybe those ancestors vocalized expressively, playfully, with what we would now call babbling, glossolalia, and even song, but without having discovered the trick of maintaining elaborate, shared, memorized dictionaries of verbal labels for people, places, things, actions, etc.

There's a parallel between the unrealized potential of that situation and the situation we're in today with regard to the Internet. The computing/networking hardware we've deployed can potentially be used (programmed) in ways that will allow us to improve upon our current capabilities as significantly as the invention of speech allowed prelinguistic people to improve upon theirs.

Speech lets us convey, with a brief utterance, any idea to which a phonetic code (a word) has been assigned. We can utter several phonemes per second, thanks to the vocal dexterity conferred by the fine musculature of our lips and tongues, with which we manipulate the sounds that we generate using our diaphragms and vocal cords. This system, combined with the mental capacity to keep track of indefinitely many ideas, allowed our ancestors to share and compare precise information far more efficiently than ever before, apparently far surpassing, in many ways, what could be accomplished by any other beings on Earth.

Now, many millennia later, with our globe-girdling network of rapid, flexible devices, each possessing powerful processors along with sensitive input and vivid output capabilities, we have a system that, if used correctly, can enable us to present and imbibe information with an efficiency another gigantic order of magnitude beyond the efficiency of speech, or of writing. Writing transforms spoken language into a durable, visual medium, but our technology can now evoke ideas far more rapidly and flexibly by presenting complex, dynamic, audiovisual structures whose meaning we can apprehend much more directly, without relying on memorized correlations of ideas to phonetic or alphabetic codes.

So we can compare the current status of our species to that of our nearly-linguistic ancestors. This applies whether or not the above depiction of human prehistory is factually accurate. If the depiction is seriously flawed, or even totally fictional, then we can view the comparison as, at worst, a fitting analogy to a fictional scenario. Or we could use the analogy of a babbling toddler, who will be ready, any day now, to begin speaking, or any number of other analogies. The main point -- the very salient and seemingly somewhat underappreciated thing -- is that we're right on the verge of amazingly new and improved abilities vis-à-vis communication, expression, learning, etc.

Speech, writing, and then... something else

Much of what we're currently doing with our technology involves electronically transmitting spoken and written language and recorded video. When the electronic images and sounds have begun responding continuously, and helpfully, to our input, guided by vast troves of available online wisdom, then we will have created something new, which we could call a medium or modality, a new form of language or communication, etc. It doesn't seem to matter which of these labels we use, as long as we understand that we're discussing something radically advanced, relative to spoken and written language.

Rather than memorizing and deploying phonetic (spoken) or alphabetic (written) codes, we'll be memorizing and deploying relatively intuitively representative audiovisual objects. These representations won't tax our memories the way our current language does. Dealing with vivid, evocative images and sounds is naturally easier than dealing with sequences of phonemes or of textual characters. A string of spoken words or a block of text is relatively opaque until we scan through it, at least cursorily. The audiovisual representations of the new medium, by contrast, will convey at least some meaning even when we only glance at them. Then, if we want to learn more about such a representation, the option of delving more deeply into it - zooming into it or summoning additional/related vivid/evocative representations - will be readily available. If we want to learn how an object relates to any other particular object, or if we want to perform any other possible operation with something, these options too can be a few taps, or drags and drops, away.

So the meanings of everything we experience through the new medium will be relatively intuitively obvious, and when something isn't obvious enough, we'll have relatively easy and direct ways to learn about it. With people around the world communicating through representative, dynamic images/sounds instead of relatively opaque words, we can expect to expeditiously overcome a great many misunderstandings, language barriers, and other limitations.

A high, high, high priority

We spend billions of person-hours every day interacting with our connected devices. Once we've optimized the operations of our interfaces, our screen time will be vastly more fruitful and enjoyable, and we'll require significantly less screen time to perform a given action. We'll be able to accomplish multitudes of tasks that currently involve reading/writing or listening/speaking far more efficiently and effectively. (This includes learning about everything from bartending to Buddhism and beyond. The time and effort that learning requires are surely among the major constraints limiting the advancement of our species. To accelerate learning is to accelerate progress. Even skills whose acquisition requires extensive hands-on experience/practice and/or interpersonal interaction very often also entail a large book-learning component. In a virtuous cycle, our interfaces will help us get smarter faster, which will help us feed ever more intelligence back into the network, etc.) And we can expect that with the new medium there will emerge entirely new tasks, new actions, new ways of seeing and doing, possibilities and phenomena that we can't yet anticipate.

With such a prospect looming so large, it's sensible for us to sideline, or put on hold, some other priorities as we commit significant resources to the project of bringing this new medium to fruition.

Hopefully I've sufficiently communicated my reasons for being interested in this topic. In the future, rather than continuing to use words to theorize about the forthcoming medium, I plan to focus on helping to actually create it, possibly including the creation of mockups to illustrate some ideas.

3 comments:

Unknown said...

I look forward to some examples of this new form of communication. We know there is always something new coming down the technological pike, and you just might be predicting it.

Unknown said...

You talk about starting to create this breakthrough in communication...any initial steps yet? It's hard to visualize what you are visualizing...write another couple paragraphs and simplify it for the likes of people like me who need a little boost to understand where you're going with this. For example are you talking about new hardware...software...noware...newware???

Josh Maurice said...

I can't say I've taken any concrete software engineering steps yet. I think a sensible initial step would be to install an "engine" and start playing with it (there are various existing programs that might be useful, called game engines, real-time engines, simulation engines, etc.).

Say there's a notion you want to express. There's a variety of images on your screen. You choose one that seems relatively related to what you want to express. When you do that, additional images, related in various ways to the image you chose, sprout up around it. These additional images are selected based in part on various aspects of your unique "subjective" context - your location, history, etc. - not just on their "objective" relationship to the image that you chose. You can then choose from among those new images, and so on. Eventually, an image appears that corresponds to the notion you wanted to express, and you take that and place it in the appropriate place.

It will take some time for the system to evolve to the point that this image-selection process will have become unambiguously more efficient than using written or spoken words to express yourself - but it can eventually become far more efficient.

I'm mostly talking about new software, which could run on the hardware we have now, although more powerful devices and better bandwidth would improve the functioning of the system I'm envisioning.

I posted some more speculation about what the interfaces may look like on February 3, 2017 ("I can see your halos: Further depiction of forthcoming interfaces"): https://jmmsynch.blogspot.com/2017/02/i-can-see-your-halos-further-depiction.html