Another Brick in the Web

Published

The screenshot from the project [**This is not a concert**](https://concert.livecoding.space) with tracking the performer's pose by AI

Introduction

A few decades ago, no one would have believed that a web browser could compete with desktop multimedia applications in creating rich audio and visual experiences. But, using a web browser on a stage for a live performance still seems like a distant future. The following critical points are remaining: handling large media environments with physical interfaces, audio specific programming languages and DSP, and high-fidelity audio and video that can interact with many users simultaneously. In this article, we’ll try to sort out how to bring that future closer to reality.  

Down the rabbit hole

A look into the browser’s history reveals its roots in the evolving of programming architecture for software agents. In simple words, the browser had been seen as an autonomous, intelligent, mediator between the human and its digital representative, being interconnected through a network. The underlying programming language JavaScript mainly served for writing scripts that manipulate objects in a manner like foreign function interface (FFI) in conventional programming languages. But modern web-browser standards provide direct access to low level stuff, like audio (WebSound), graphics (WebGL), general computing on GPU (WebGPU), network (WebSockets) and even machine code (Wasm). All that gives us every reason to consider the browser soon to be a seamless environment for holding audio-visual performances.

Browser inside browser, that inside browser, that inside…

It is important to note that today, under the term “web browser” should be understood not only well known desktop executable standalone applications, like Google Chrome, Apple Safari, Mozilla Firefox or Microsoft Edge, but all applications that use the underlying corresponding web-browser engines aka Chromium or WebKit. For example, Touch Designer and Cycling74 Max/MSP are utilizing Chromium Embedded Framework (CEF), that can be embedded in anything that is a patch, making it possible for someone to run a web browser inside Ableton Live through Max for Live. Another example is the low level C++ programming framework JUCE that utilizes the same CEF. Then there are server related packages like NodeJS that can also be considered a “web browser” on its own. It allows creating headless apps still utilizing all modern web standards. For example Node For Max can be found in the Max/MSP environment. Apple platform and its Swift programming language is awesome for developing multimedia applications, it also provides WKWebView object as a platform-native view (based on WebKit), that could incorporate web content seamlessly into app’s UI. All of the above web embedded engines support a full web-browsing experience and present HTML, CSS, and JavaScript content alongside app’s native views.

SuperCollider inside Web Browser? WasmGC to the rescue

Higher-level programming languages like Processing based on Java, VVVV based on C#, SuperCollider etc. are more likely to have garbage collection as a standard feature. Fortunately, the recent web standards (WasmGC) allow garbage-collected programming languages to be compiled to WebAssembly. [In our 8th programming tutorial, Becky Brown already gave us a practical inside in experimenting with WebAssembly for creative coding, red.] That feature is opening just endless possibilities on bringing well known virtual machines onto web browsers, and not force porting an existing variety of multimedia applications onto JavaScript or low level Wasm. One of the recent examples is the Hoot language. Hoot includes a Scheme to Wasm compiler, allowing Scheme code to run in recent browsers as a first-class citizen. Hoot requires no external tools and provides its own Wasm assembler and linker, and additionally contains a fully featured development environment including a disassembler, interpreter, and debugger. Direct REPL integration permits Hoot users to hack comfortably without switching tools.

main The screenshot from the project This is not a concert with tracking the performer’s pose by AI

We don’t need no JS. LiveViews and Streaming DAGs

LiveViews were introduced in Phoenix web framework for the Elixir programming language. LiveViews are processes that receive events, update their state, and render updates to a page as diffs (differential rendering technique by tracking very small HTML changes). The LiveView programming model is declarative: events in LiveView are regular messages which may cause changes to the state. Once the state changes, the LiveView will re-render the relevant parts of its HTML template and push it to the browser, which updates the page in the most efficient manner. The creator of Sonic Pi, Sam Aaron, recently made an announcement on making it truly collaborative:

“I’m building something new, super fun and experimental with Elixir (Phoenix Live View), Lua (Luerl) and Javascript (Web Audio). What does it do now? Well, it bleeps!”

The problem that LiveViews and Streaming DAGs try to solve is impedance mismatch. There are client-side DOM effects and server-side queries. It feels like such a composition should be well-defined, but it’s not, because of how the web works. The network makes us split our composed program into two separate programs, breaking function composition. This is basically an impedance mismatch. Your programming language is designed for single programs but your application as a whole is a client/server distributed system. This impedance mismatch is the root cause of complexity in web development and functional programming can solve it.

That approach offers an opportunity to remain an artist in a hosted language like Ruby, Clojure, Elixir, Smalltalk, Scheme etc., but to use all the potential of Web technologies. I would also like to mention Hotwire as an alternative approach to building modern web applications without using much JavaScript by sending HTML instead of JSON over the wire, as well as streaming HTML out of order without JavaScript that is using the Declarative Shadow Dom recent web browser feature.

AI for computer vision and WebXR

Today, resource heavy real-time tasks that utilize computer vision can be driven by AI in Web browsers. Interaction scenarios including face detection or face meshing, hands tracking, instant motion tracking, object detection, pose tracking etc. could be easily delegated to the current AI frameworks. The WebXR Device API is a critical component of WebXR, providing a set of APIs allowing web developers to access and utilise the various sensors and capabilities of XR devices, including head-mounted displays (HMDs), controllers, and other input devices. This API brings a level of consistency and flexibility to immersive web experiences, enabling developers to create applications that interact seamlessly across different platforms and XR devices. Adding complex and resource heavy computations can now be eliminated with the use of AI. Mediapipe, for example, provides a suite of libraries and tools to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in web applications.

In conclusion, I would like to recall Rich Hickey, known as the creator of the Clojure programming language. He emphasizes simplicity’s virtues over easiness’ ones, showing that, while many choose easiness, they may end up with complexity, and the better way would be to choose simplicity. I hope that the current web browser evolution path will follow this principle and become available as an instrument to a live performer on stage.

To Top ↑