The pieces are in place
Imagine a time just before speech -- when people had the physical ability to produce the same phonemes we use today, but before they had developed the practice of assigning definite meanings to sequences of phonemes. Maybe those ancestors vocalized expressively, playfully, with what we would now call babbling, glossolalia, and even song, but without having discovered the trick of maintaining elaborate, shared, memorized dictionaries of verbal labels for people, places, things, actions, etc.
There's a parallel between the unrealized potential of that situation and the situation we're in today with regard to the Internet. The computing/networking hardware we've deployed can potentially be used (programmed) in ways that will allow us to improve upon our current capabilities as significantly as the invention of speech allowed prelinguistic people to improve upon theirs.
Speech lets us convey, with a brief utterance, any idea to which a phonetic code (a word) has been assigned. We can utter several phonemes per second, thanks to the vocal dexterity conferred by the fine musculature of our lips and tongues, with which we manipulate the sounds that we generate using our diaphragms and vocal cords. This system, combined with the mental capacity to keep track of indefinitely many ideas, allowed our ancestors to share and compare precise information far more efficiently than ever before, apparently far surpassing, in many ways, what could be accomplished by any other beings on Earth.
Now, many millennia later, with our globe-girdling network of rapid, flexible devices, each possessing powerful processors along with sensitive input and vivid output capabilities, we have a system that, if used correctly, can enable us to present and imbibe information with an efficiency another gigantic order of magnitude beyond the efficiency of speech, or of writing. Writing transforms spoken language into a durable, visual medium, but our technology can now evoke ideas far more rapidly and flexibly by presenting complex, dynamic, audiovisual structures whose meaning we can apprehend much more directly, without relying on memorized correlations of ideas to phonetic or alphabetic codes.
So we can compare the current status of our species to that of our nearly-linguistic ancestors. This applies whether or not the above depiction of human prehistory is factually accurate. If the depiction is seriously flawed, or even totally fictional, then we can view the comparison as, at worst, a fitting analogy to a fictional scenario. Or we could use the analogy of a babbling toddler, who will be ready, any day now, to begin speaking, or any number of other analogies. The main point -- the very salient and seemingly somewhat underappreciated thing -- is that we're right on the verge of amazingly new and improved abilities vis-à-vis communication, expression, learning, etc.
Speech, writing, and then... something else
Much of what we're currently doing with our technology involves electronically transmitting spoken and written language and recorded video. When the electronic images and sounds have begun responding continuously, and helpfully, to our input, guided by vast troves of available online wisdom, then we will have created something new, which we could call a medium or modality, a new form of language or communication, etc. It doesn't seem to matter which of these labels we use, as long as we understand that we're discussing something radically advanced, relative to spoken and written language.
Rather than memorizing and deploying phonetic (spoken) or alphabetic (written) codes, we'll be memorizing and deploying relatively intuitively representative audiovisual objects. These representations won't tax our memories the way our current language does. Dealing with vivid, evocative images and sounds is naturally easier than dealing with sequences of phonemes or of textual characters. A string of spoken words or a block of text is relatively opaque until we scan through it, at least cursorily. The audiovisual representations of the new medium, by contrast, will convey at least some meaning even when we only glance at them. Then, if we want to learn more about such a representation, the option of delving more deeply into it - zooming into it or summoning additional/related vivid/evocative representations - will be readily available. If we want to learn how an object relates to any other particular object, or if we want to perform any other possible operation with something, these options too can be a few taps, or drags and drops, away.
So the meanings of everything we experience through the new medium will be relatively intuitively obvious, and when something isn't obvious enough, we'll have relatively easy and direct ways to learn about it. With people around the world communicating through representative, dynamic images/sounds instead of relatively opaque words, we can expect to expeditiously overcome a great many misunderstandings, language barriers, and other limitations.
A high, high, high priority
We spend billions of person-hours every day interacting with our connected devices. Once we've optimized the operations of our interfaces, our screen time will be vastly more fruitful and enjoyable, and we'll require significantly less screen time to perform a given action. We'll be able to accomplish multitudes of tasks that currently involve reading/writing or listening/speaking far more efficiently and effectively. (This includes learning about everything from bartending to Buddhism and beyond. The time and effort that learning requires are surely among the major constraints limiting the advancement of our species. To accelerate learning is to accelerate progress. Even skills whose acquisition requires extensive hands-on experience/practice and/or interpersonal interaction very often also entail a large book-learning component. In a virtuous cycle, our interfaces will help us get smarter faster, which will help us feed ever more intelligence back into the network, etc.) And we can expect that with the new medium there will emerge entirely new tasks, new actions, new ways of seeing and doing, possibilities and phenomena that we can't yet anticipate.
With such a prospect looming so large, it's sensible for us to sideline, or put on hold, some other priorities as we commit significant resources to the project of bringing this new medium to fruition.
Hopefully I've sufficiently communicated my reasons for being interested in this topic. In the future, rather than continuing to use words to theorize about the forthcoming medium, I plan to focus on helping to actually create it, possibly including the creation of mockups to illustrate some ideas.