VR software and hardware

📅 2024-08-14 📄 source

Linux-centric overview

General VR stack simplified:

[VR app]
↕ VR API
[VR runtime]
↕ stuff between runtime and VR hardware incl. sensors protocols
[VR hardware]

All common VR apps use VR APIs with VR libraries or runtimes implementing them, which abstract away complexities and differences of VR hardware, presenting it in uniform convenient manner and taking care of stuff which app developers won't be happy to bother with. You can see VR APIs as analog for 3D graphics APIs, and VR runtimes as analog for implementations of these APIs, some being specific to single VR hardware family (like graphics drivers from GPUs vendors), others more broad/modular (like Mesa). Each HMD usually has "perfect support" in specific vendor-supplied VR runtime, but can be also supported by alternate FOSS project (currently all effort is converging to Monado project becoming like "Mesa for VR"), also there are "translation layers" between APIs and other "bridges" allowing to use apps developed against an API with HMDs which don't have this API supported in their vendors runtime, but have another one which is simpler to develop against then to reverse engineer HMD stuff (like all those D3D<->Vulkan wrappers). Also there are "VR streaming" solutions for using apps running on PC with "standalone HMDs", which involves sending frames as compressed video stream over network.

Note: VR APIs don't handle 3D rendering, rendering is done using existing 3D graphics APIs, VR APIs are used to obtain data which affects rendering, such as spatial positions of cameras for projections and FOV, and to pass rendered frames to VR runtime which can transform them or even use them as input to synthesize actual displayed frames (compositing, reprojection, etc)

Note: in tech world term "XR" (extended) is preferred as generalization of overlapping concepts of VR/AR (augmented)/MR (mixed reality); all these, including VR as understood by most "HMD with stereoscopic display and sensors and possibly some controllers", are just usecases of more generic tech:

"tracking spatial position/movement of devices, user body parts and other objects in real world" (including tracking via cameras, which belongs to computer vision domain)
"rendering projection frames based on position/movement of tracked objects, possibly adjusted to optics between displays and viewers eyes"

Note: another common non-VR XR usecase is "magic window", when virtual environment is "projected" on phone display (not placed in some "phone VR viewer") adjusting to its spatial position. Besides, there are developments in area of volumetric displays.

Note: VR runtimes are userpace things and all VR-specific stuff is computed in userspace (well, most GPU drivers have 3D graphics APIs implemented in userspace too, with kernel parts doing only basic stuff). If runtime has modular architecture allowing to add support for different HMDs, its modules are usually called drivers. Runtimes themselves usually aren't called drivers, even though monolithic ones are kind of.

There are now 3 relevant cross-vendor VR APIs:

Khronos OpenXR. 2017-here to stay. THE industry standard, cross-vendor, cross-platform, "like Vulkan in VR world". Used by most new AAA games ( https://pimax.com/blogs/blogs/steam-vr-vs-openxr-which-runtime-is-best state of adoption in AAA games as of spring 2024). Supported by all major VR runtimes, including:
- Monado (by Collabora). Cross-platform FOSS runtime supporting many HMDs from different vendors (though for now support for all common consumer HMDs seems not yet mature compared to proprietary runtimes provided by vendors, with positional tracking missing or partially implemented, but there is a lot of progress happening), "like Mesa in VR world"
- SteamVR. Proprietary, works on Windows and Linux (still in beta, since 2016). Primarily supports Valve Index, HTC Vive HMDs and other 3rd party HMDs which use same hardware protocols known as "lighthouse" designed for Valve VR ecosystem. Has modular architecture with several other official and 3rd party "SteamVR drivers" (note that Linux support in runtime itself doesn't mean Linux support in any driver, which can depend on some Windows-only stuff, though "lighthouse" driver supports Linux, though not without issues).
- Oculus/Meta spyware^W runtime. Supports Oculus/Meta HMDs. Windows-only (but anyway they don't produce PC VR HMDs anymore, and their Windows-only software for standalone Quest is "VR streaming" solution, which has FOSS alternatives like ALVR, all of them having "clients" running on top of "local" Oculus runtime on HMD side).
- PimaxXR. "Unofficial officially-promoted" OpenXR runtime for Pimax HMDs (which are otherwise supported via "official" SteamVR driver, both seem to depend on some proprietary Windows-only Pimax stuff)
Valve OpenVR. 2015. Was created before OpenXR as first major attempt to standartize PC VR and untie it from Oculus/Meta monopoly. Used by most "first wave" PC VR AAA games (before OpenXR adoption). Implementations:
- SteamVR, original official implementation of it, and only of it before OpenXR adoption, "SteamVR" name is often misused as name of the API.
- OpenComposite, OpenVR-to-OpenXR translation layer (can it run Alyx flawlessly?)
W3C WebXR. 2018. Yes, web developers can now do it too. Notably used by web VR video players. Implementations:
- Chrome for Android (seems to work only on some phones via Google proprietary stuff, see ARCore, and yes, it renders eye projections on phone screen for viewing with "Cardboard VR" devices).
- Chrome desktop (seems to work only on Windows, via OpenXR and other APIs)
- not implemented in Firefox as of now (which is especially sad given that its deprecated "prototype" WebVR originated from Mozilla, also there was Firefox Reality VR web browser which is now dead)
- Igalia Wolvic. Ok, Firefox Reality is not really dead, https://wolvic.com/en/faq/ https://github.com/Igalia/wolvic/issues/1394 need to further clarify which platforms does it target and whether it can be run on some platform with FOSS VR stack
- there's also https://github.com/immersive-web/webxr-polyfill which provides some level of support on phones via Sensors APIs which are supported on all common web browsers on mobile devices allowing web app to access sensors readings, but getting it to work seems really hard.

Notable vendor-specific APIs:

Google Cardboard/Daydream/ARCore. 2014/2016-2019/2018. Started with Cardboard, implemented in now open-source Cardboard library for Android, notably used by PhoneVR allowing to use phone as PC VR HMD, also there's still mostly useless Cardboard demo app in Play store. Then Google tried to push Daydream as next generation phone VR platform, but made it proprietary, limited to flagship phones and killed it soon. Then it created ARCore which is again proprietary and limited to flagship phones (including Pixel of course), notably used by Chrome for Android for WebXR. Though it seems to also make use of Cardboard library for setting up projections. And "Cardboard VR" also became synonym for "phone VR". There's also ambigious "Google VR" term which seems to refer either to Daydream runtime, or to Google-provided VR SDK allowing to develop against both Cardboard and Daydream. Also, Cardboard is 3DoF only, ARCore can be used for 6DoF. In general, phone VR seem to be currently constrained by absence of OpenXR runtime as "part of standard Android base, which can be safely assumed to be available on any phone", or at least mature 3rd party FOSS one, able to use phones own sensors and display (Monado is almost there?) Most standalone HMDs like Oculus Quest also run Android(-based software), but with their own proprietary VR stacks tied to their hardware.
Oculus/Meta OVR aka Oculus API. 2012-2022? Oculus were pioneers of consumer VR and are still leading HMD producer, unfortunately bought by Meta in early days and locked down their ecosystem. However, they adopted OpenXR and deprecated their API, supporting it for legacy apps only.
Microsoft WMR (Windows Mixed Reality). 2015-2024? Currently in process of being killed, but few good HMDs were mass produced and are now very cheaply available used (HP Reverb G2), useable for OpenXR/OpenVR apps on Windows via SteamVR wich has wmr backend communicating with WMR runtime, making it act as OpenXR/OpenVR-WMR translation layer (and there is also Monado driver for WMR HMDs which doesn't rely on Microsoft WMR runtime, as WMR spec appears to cover not just API but also hardware protocols, but Monado WMR driver tracking seems to be subpar for now)
Apple ARKit. iOS/VisionOS runtime, as always, Apple doesn't care about industry standards and tries to isolate its ecosystem as much as possible, https://www.reddit.com/r/oculus/comments/192jxk0/apple_tells_developers_not_to_use_the_words_ar/

HMDs

Wired PC VR HMD connectivity: most PC VR HMDs still use separate USB/HDMI/power cables, even if "bound" together. There was a push from Nvidia for "all-in-one" USB-C connector and cable using USB DP alt mode for video out and USB PD for power delivery, which they called VirtualLink (it was just subset of modern USB, nothing proprietary), but quickly abandoned it, "VirtualLink" USB-C port was only present in some RTX 20xx cards. "All-in-one" USB-C is also present in some newer AMD cards, including RX 7000 series. HMDs which support single USB-C cable are still hard to find. Standalone Oculus Quest supports Oculus Link USB-C cable, but it sends video via proprietary Oculus software using H.264/H.265 compression.

Wired PC VR HMDs all use DP/HDMI for video out, but for for sensors data there is no cross-vendor protocol standard. Closest one is probably Valve "lighthouse" which is used by multiple manufacturers ("minor independent" manufacturers tend to target Valve ecosystem).

Most PC VR HMDs are designed for "room-scale VR" requiring placing some stationary beacons or outer cameras at least for positional tracking/6DoF. HMDs with "self-sufficient" positional tracking include WMR HMDs and standalone HMDs such as Oculus Quest.

Note: 3DoF (3 degrees of freedom, i. e. 3 axes of rotation) is "looking around from single point", requiring rotational tracking only, 6DoF (3 axes of rotation + 3 axes of translation) is "looking and moving around", requiring rotational+positional tracking. Rotational is relatively easy to implement "well enough" for a device with gyro (for inyalowda lucky to have great "beacon" underneath), positional is very hard, especially "inside-out" (using only sensors placed in HMD) without beacons, and it's common situation when FOSS project for HMDs with full set of sensors supports rotational very well, but positional is missing or stuck in experimental state with poor experience.

Some specific HMDs which are of interest to me:

~~Valve Index: 1440x1600 per eye, "reference" PC VR HMD~~ too low resolution
Bigscreen Beyond: 2560x2560 per eye (one of highest resolutions), smallest&lightest (127 g), also designed for SteamVR (lighthouse protocols), using single USB-C. Cons: no tracking without beacons (not even 3DoF, SteamVR base stations required, same as for Index), custom-made for individual users face (i. e. hard to share and low resell value, no IPD adjustement, though face cushion can be hotswapped in theory), very tiny "sweet spot" of clear picture (small lenses), price
Xreal Air: AR glasses with ~50 degrees FOV, supported by Monado driver, USB-C. Cons: FOV too small for immersion, can't be used with HDMI source without special external device (which injects power)
Oculus/Meta Quest 2/3/3S: 1832x1920/2064x2208/1832x1920 per eye, standalone, Oculus is still leader of HMDs market, with OpenXR available in runtime on HMD side (since Quest 2), backed by superior internal tracking implementation, it may be easiest to run all-FOSS on PC side via VR streaming for best 6DoF experience (high device availability + possibility to use universal API without reverse engineering anything = potentially huge developer community), passthrough video can be captured (so it can be used as VR camera, with good image quality since Quest 3, though camera FOV is ~90deg, still feels quite immersive https://www.youtube.com/watch?v=yEpCF0RGPAk ). Cons: spyware (no, it's not possible to use it without Meta account, bootloader is locked, no alternate firmwares, if forced offline its firmware generates lots of "dark data" for sending later, which cannot be deleted and accumulates until it fills up the storage and device has to be factory reset: https://www.reddit.com/r/OculusQuest/comments/znut5g/a_word_of_caution_about_using_oculus_quest_2_in/ )
HP Reverb G2: 2160x2160 per eye, one of best value 2K PC VR HMDs, WMR
Homido Mini: super small phone VR viewer, merely temple-less glasses with clamp to hold phone at right distance in front of them, foldable and fits in pocket

VR smoothing technologies

VR experience suffers a lof from visuals on frame currently seen by user not matching his current position. 60 FPS are fine for most and 120 for almost everyone for perception of movement of objects on screen as smooth, and HMD displays have appropriate refresh rates, however:

apps generally can't mantain any specified framerate 100% of time
high framerate doesn't mean low frametime, frames can be ready for displaying with significant delay after moments of time when tracking data for their rendering was read from sensors (especially in case of streaming)
high framerate and low frametime still don't mean that frames are rendered with tracking data as close as possible to actual positions at the moment when they are displayed
tracking sensors are "noisy"

One approach for dealing with these problems is about modifying rendered frames before display to better match current/predicted users position, generally known as reprojection, https://en.wikipedia.org/wiki/Asynchronous_reprojection (called "asynchronous" because it's not synchronous with apps rendering thread, "reprojection" because it is seen as reprojecting 3D scene from altered view point/direction, even though with most sophisticated solutions it's not the only thing which is happening). Simplest rotational reprojection treats projections as "celestial spheres", more sophiscicated ones use depth maps with special handling of VR UI elements and head-locked objects. Chronology of developments in this area:

in 2014 Oculus first introduced "timewarp" in its SDK 0.3.1 for Rift DK2; it was reprojection applied to each frame upon its rendering completion, i. e. synchronous with rendering, "timewarp" implying that it's done for predicted (future) position (?) Initially it was rotational only, positional was added in 2015 in SDK 0.6 https://x.com/volgaksoy/status/599251845156995072
in early 2016 Oculus introduced "ATW" (Asynchronous TimeWarp), basically same thing but performed not upon each frame rendering completion, but prior to each display update (wouldn't it be better to call it "display-synchronized"?). https://developer.oculus.com/blog/asynchronous-timewarp-on-oculus-rift/ https://developer.oculus.com/blog/asynchronous-timewarp-examined/ https://www.roadtovr.com/oculus-touts-new-vr-rendering-feature-valve-calls-it-an-ideal-safety-net/ . This deals not just with "movement-to-visual" latency for individual frames, but also with apps framerate dropping lower than display refresh rate. Oculus initial implementation description mentions dependency on vendor-specific extensions for GPU drivers, Nvidia VRWorks and AMD LucidVR. Valve added asynchronous reprojection soon but initially limited to Nvidia GPUs on Windows https://www.roadtovr.com/steamvr-update-adds-asynchronous-reprojection/ . Carmack had pushed for it back in 2014 https://www.dsogaming.com/news/john-carmack-is-pushing-hard-for-asynchronous-time-warp-on-the-pc-best-thing-coming-from-mobiles/
in late 2016 Oculus introduced "ASW" (Asynchronous SpaceWarp), which added prediction of motion of virtual objects to prediction of motion of user/devices. https://developer.oculus.com/blog/asynchronous-spacewarp/ . This is much more complicated, and even advanced implementations like Oculus ASW seem to get mixed user feedback, with complaints about artifacts. It's often described as "frame synthesis" using multiple previous frames, as opposed to "reprojection" using only the last frame
in 2017 AMD added something to their driver enabling asynchronous reprojection in StreamVR and on AMD GPUs https://www.roadtovr.com/asynchronous-reprojection-coming-steamvr-amd-gpus/ and ASW in Oculus runtime https://www.roadtovr.com/amd-radeon-software-crimson-relive-17-4-1-driver-asynchronous-spacewarp-reprojection-r9-fury-390-290-rx-480-470/
somewhere between 2016 and 2020 asynchronous reprojection was enabled in SteamVR for Linux for AMD GPUs, sources hinting that it required "high priority compute queue" from GPU driver https://www.phoronix.com/news/Valve-AMDGPU-High-Prio-VR https://lists.freedesktop.org/archives/amd-gfx/2016-December/004069.html http://web.archive.org/web/20170408083721/https://community.amd.com/community/gaming/blog/2016/03/28/asynchronous-shaders-evolved . Nvidia support was blocked for some time mainly because "high priority compute queue" support was missing in Linux driver https://x.com/dan_ginsburg/status/1261403868279140353 until 2021, finally became useable in 2023 https://github.com/ValveSoftware/SteamVR-for-Linux/issues/616
in 2021 proprietary VR streaming solution Virtual Desktop added "SSW" (Synchronous SpaceWarp). They claimed it's almost same as ASW but implemented in client app running on HMD (Oculus ASW had been introduced before Oculus/Meta switched to standalone HMDs, and after they added VR streaming from PC to standalone HMDs, ASW implementation seems to have stayed on PC runtime side, how can it even work without direct access to display and what about local apps running on HMD?), https://www.uploadvr.com/virtual-desktop-synchronous-spacewarp/ . Notable that they did it in cooperation with Qualcomm, using some proprietary stuff provided by Qualcomm relying on some hardware in Snapdragon XR2 SoC which powers most standalone HMDs, https://www.qualcomm.com/developer/blog/2022/09/virtual-boost-vr-rendering-performance-synchronous-space-warp

TODO: pros and cons, adoption outside of VR as alternative to Freesync?

https://www.youtube.com/watch?v=IvqrlgKuowE LTT about async reprojection in desktop games
rotational reprojection doesn't require wider FOV rendering, missing parts on the edges of actual FOV can be filled by stretching visible ones

Another approach is to reduce amount of computations needed to render frames

DLSS/FSR
"foveated rendering" tech relying on eye tracking, which reduces rendering detail of objects on periphery of users vision
it seems that Valve was generally pushing for reducing frametimes via dynamically lowering detail as preferred to advanced reprojection/synthesizing frames

Other:

"pose prediction": when runtime requests next frame rendering from app, it provides not current tracking data, but predicted values for the time moment when frame is expected to be displayed
"phase sync": counter-intuitive optimization for cases when apps frametime is generally much smaller than display refresh interval, delaying start of next frame rendering by statistically predicted interval
"sliced video encoding": for VR streaming solutions, allows to start sending rendered frame before it's fully encoded by video codec

TODO: current state of VR smoothing in relevant VR stacks, especially in Monado

There are claims that Google Daydream supports reprojection, so Cardboard probably doesn't

Using phone as PC VR (and Android-based standalone HMDs)

In such "VR streaming" setups PC software which receives tracking data from phone, feeds it to app and streams video rendered by app to phone is called "server", and phone software which streams tracking data to PC and receives and displays rendered video is called "client".

Currently most mature partially-FOSS (runs on Linux via SteamVR) solution is ALVR project. It consists of server https://github.com/alvr-org/ALVR (implemented as SteamVR driver) and client https://github.com/alvr-org/PhoneVR (Android app using Cardboard library, i. e. 3DoF only), https://www.youtube.com/watch?v=_5k9htTdpuI

Standalone HMDs like Oculus Quest also use Android, allow to install Android .apks and expose OpenXR API to apps, which creates appeal to have single streaming solution "to rule them all". Whole stack to be like:

PC Monado runtime with VR streaming server backend, streaming to...
Android OpenXR VR streaming client app, using Android vendors OpenXR runtime or...
(for devices without vendors OpenXR runtime, i. e. regular phones) Android Monado runtime with "Cardboard VR" backend (for any Android phone, being able to display projections on phones own screen and provide tracking via phones own sensors, reading them via AOSP APIs).

Alternatively, phone Android VR streaming client should not rely on any VR API on phone, but just send sensors data and camera stream to PC for processing (one of greatest failures of phone VR is phone getting hot from computations, might be good idea to offload them to PC as much as possible, though it's a question whether compressing and transmitting data will be lighter).

Relevant projects:

ALVR project also includes ALVR client for Android-based standalone HMDs (using OpenXR), its PhoneVR client was actually "imported" from original PhoneVR project which includes own PhoneVR server which is now obsolete (higher latency, Android Daydream dependency, Windows-only), also there's active ALXR fork with emphasis on client app using OpenXR (what's the difference from ALVR one?)
https://github.com/WiVRn/WiVRn is FOSS VR streaming solution for OpenXR apps, consisting of server providing OpenXR API (using Monado?) and client for Android-based standalone HMDs
https://gitlab.freedesktop.org/monado/electric-maple is FOSS VR streaming solution for OpenXR apps, consisting of Monado backend and Android client using OpenXR. Has some attention from Collabora

There are claims that Oculus streaming solution has some advanced optimizations which are very hard to implement in alternate projects, or even impossible to implement without using some private interfaces available only for Oculus streaming client. However there are alternate proprietary streaming solutions such as Steam Link app for Oculus Quest (currently only works only with Windows version of SteamVR) and Virtual Desktop, which seem to compete well (VD often considered better than official one). Complaints about ALVR mostly fall into 3 groups:

latency
wobbling of tracking
lack of async reprojection (shouldn't it be done by underlying VR stack on HMD side? Doesn't OpenXR app always supply information about against which "pose" has the frame been rendered, allowing underlying VR stack to reproject for current/predicted one? VD SSW suggests that it's not implemented in underlying VR stack as something which is done automatically for any client, but possible for 3rd party one)

TODO: Android "VR helper" setting, OpenXR "standard loader"

TODO: which protocols are used between server and client? Video is universally sent as H.264/H.265/AV1-compressed stream, but what about VR-specific metadata for frames, such as frame submission timing for reprojection? Clocks sync? https://github.com/alvr-org/ALVR/wiki/How-ALVR-works#the-streaming-pipeline-overview .

QR codes of Cardboard VR viewers

QR code actually encodes URL of format https://google.com/cardboard/cfd?p={device_params_string} (or URL https://g.co/cardboard which indicates standard "Cardboard Viewer v1" device, or URL which redirects to such URL, see https://github.com/googlevr/cardboard/issues/444 https://developers.google.com/cardboard/reference/unity/class/Google/XR/Cardboard/Api#savedeviceparams ), where {device_params_string} is base64-encoded serialized protobuf message of format DeviceParams, defined and explained in https://github.com/googlevr/cardboard/blob/master/proto/cardboard_device.proto . To parse it with Python ( https://protobuf.dev/getting-started/pythontutorial/ ):

clone https://github.com/googlevr/cardboard
install "protoc" tool and compile .proto file to Python via executing command in repo dir: protoc --proto_path=proto/ --python_out=. cardboard_device.proto , should produce file "cardboard_device_pb2.py"
install "protobuf" Python module and execute following Python code:

import base64
import cardboard_device_pb2
device_params = cardboard_device_pb2.DeviceParams()
device_params.ParseFromString(base64.b64decode(device_params_string))
print(device_params)

Example output for device_params_string CgZIb21pZG8SDUhvbWlkbyAibWluaSIdhxZZPSW28309KhAAAEhCAABIQgAASEIAAEhCWAE1KVwPPToIexQuPs3MTD1QAGAC:

vendor: "Homido"
model: "Homido \"mini\""
screen_to_lens_distance: 0.05299999937415123
inter_lens_distance: 0.06199999898672104
left_eye_field_of_view_angles: 50.0
left_eye_field_of_view_angles: 50.0
left_eye_field_of_view_angles: 50.0
left_eye_field_of_view_angles: 50.0
tray_to_lens_distance: 0.03500000014901161
distortion_coefficients: 0.17000000178813934
distortion_coefficients: 0.05000000074505806
vertical_alignment: CENTER
primary_button: TOUCH

If "Skip" button is pressed in Cardboard app QR code scan dialog, Cardboard sets params for "Cardboard Viewer v1" which are defined in class CardboardV1DeviceParams in https://github.com/googlevr/cardboard/blob/master/sdk/device_params/android/java/com/google/cardboard/sdk/deviceparams/CardboardV1DeviceParams.java#L21

Linux VR desktop

I'm currently interested in "simple" solution with single desktop "projected" on VR surface, like a giant screen floating in space in front of user, initially positioned on HMD virtual image plane, with possibility to lock to head/re-center it in case it drifts.

3DoF VR videos

This is only type of VR video which is currently affordable to produce, it is filmed by double camera with wide angle lens (180 degree) or 2 opposite-facing double cameras (360 degree, via "stitching" of 2 180degree videos). Perception is perfect only as long as head movements are restricted to positions in which eyes positions match cameras (only pitch/nodding movements, following the central vertical line of video), but in practice feeling of depth is good for most people when looking around "naturally".

To be more precise about video format, there are different options, but almost all use "SBS EQR" (side-by-side equirectangular), popularized by Youtube (its Android app has VR mode for viewing such videos in Cardboard VR viewers).

Most websites with VR videos currently seem to use "Delight XR" video player which relies on browser WebXR support and can be recognized by common UI and <dl8-video> tag in code: https://delight-vr.com/documentation/dl8-video/

Solutions for local playback:

https://github.com/mysterion/aframe-vr-player . It's a "self-sufficient" webapp, which can be seen from phone VR perspective as smart hack to leverage good support of WebXR in Chrome on Android devices with ARCore, without bothering with Android development
proprietary DeoVR and SKYBOX VR players are often mentioned, both seem to have given up phone VR in favor of standalone HMDs like Oculus Quest, DeoVR is also "VR video publishing platform" trying to stay on the tip of progress and has interesting blog posts and community discussions, including 6DoF tech developments.
- https://deovr.com/blog/54-passthrough-the-next-frontier-of-the-virtual-reality-experience
- https://deovr.com/blog/92-6dof-volumetric-experiences-at-deovr

Note: smartphones (mass produced mobile SoCs) are currently limited to 3840x2160 resolution for hadware accelerated video decoding (for comparison, Nvidia cards NVDEC can do 8192x8192); software decoding, even if available, usually doesn't play well with VR projection rendering pipelines. Easiest way to check hardware accelerated video decoding limitations of Android smartphone is to open "chrome://gpu" URL in Chrome. Probably the only mobile SoC family capable of ultra high resolution video decoding is Qualcomm Snapdragon XR, used by virtually all standalone HMDs.

TODO: future VR video formats with compression redundancy between half-frames, efficient usage of available rectangular frame size

TODO: affordable VR video cameras

aframe-vr-player notes

aframe-vr-player uses A-Frame which uses three.js . three.js is JS 3D graphics library, it doesn't touch HTML/DOM, it has its own model with "renderer" rendering "scene" using "camera", "scene" is filled with "meshes" 3D objects, and only thing related to HTML/DOM is renderer.domElement canvas, which is intended to be added to page to display rendered frames somewhere. A-Frame is JS XR framework, it uses DOM to allow describing XR scene as hierarchy of custom HTML tags (<a-scene>, <a-entity>, <a-assets>, <a-image>), parses them, sets up and renders scene using three.js (and WebXR API of course, using it to adjust camera and handing renderer canvas over to it so that browser can pass projections to lower level VR runtime for displaying on HMD).

Main entities in aframe-vr-player scene:

2 spheres for left/right eye projections, textured with corresponding parts of decoded video frame
1 camera which is initially set to left eye; rendering loop works in such way that for each frame rendering is performed twice: first left projection is rendered, then camera is "switched" from left eye to right eye, right projection is rendered and camera is switched back to left eye; camera is not "moved" when switch happens, both spheres are placed in same place in coordinate space and camera is in their center, but eye change event handler makes only the sphere for "current eye" visible to camera and hides the other

There's no projection maths in player code, it just sets up spheres with texturing from video and camera with FOV and direction, and underlying 3D stack does the rest.

There's no explicit access to decoded video frame pixmap, it just sets <video> element id as texture source for spheres with offset for left/right half of frame, and underlying stack does the rest.

Stereo is disabled when UI is shown via setting "default eye" sphere visible to camera for both eyes and hiding another one.

A-Frame scene hierarchy elements have attrs which reference A-Frame "components", and values of these attrs contain args for these components.

Approximate flow:

aframe is initialized with <a-scene> element of players HTML page and starts parsing scene hierarchy and setting up underlying stack accordingly
for <a-scene>-><a-entity id="cameraRig">-><a-entity recenter>-><a-entity id="cameraWrapper">-><a-entity id="camera" camera look-controls stereocam="eye:left"> it instantiates "stererocam" component; it represents single camera used to render both projections; when rendering loop is running, each pass starts with camera configured for left eye (thus initial "eye:left" value), left eye projection is rendered, then underlying stack triggers onEyeChange event, causing camera to switch to right eye, right eye projection is rendered, and finally underlying stack triggers 2nd onEyeChange, causing camera to switch back to left eye for next frame; scene has 2 spheres for left/right eye (see later), and each time eye is set, camera obj "layers" is updated so it can only "see" one of these 2 spheres which is for current eye
for <a-scene>-><a-entity id="env" env-manager>, it instantiates "env-manager" component defined in EnvManager.js
"env-manager" component instance adds attr "equirectangular" (if EQR projection is chosen, which is by default) to <a-scene>-><a-entity id="env" env-manager>, causing instantiation of "equirectangular" component (defined in Equirectangular.js)
"equirectangular" component instance creates 2 child elements <a-scene>-><a-entity id="env" env-manager equirectangular>-><a-entity eq-sphere="eye: {eye}, fov: 180, side: {side}" geometry="sphere" material="src: #video">, causing instantiation of "eq-sphere" component (defined in same Equirectangular.js) for each of them; 1st represents sphere for left eye projection (referenced as leftEye in component code), 2nd right (rightEye); for each of them in attr "eq-sphere" value:
- {eye} is eye which sphere is visible for; normally it's left for 1st element and right for 2nd; however if UI is not hidden, it is both for one of 2 which is for "default eye" (left one by default, can be changed in settings), and other one is hidden via setting its .object3D.visible = false (not every important detail of the scene is stored in DOM entity which is mapped to HTML tag/attr/attr val); this effectively disables stereo when UI is shown
- {side} is side of SBS video frame which is to be used to texture the sphere; left for 1st element and right for 2nd
"eq-sphere" component instances for each of these 2 elements:
- instantiate THREE.SphereGeometry and store it in elements .object3D.children[0].geometry
- call .object3D.children[0].geometry.attributes.uv.setXY() (i. e. .attributes.uv.setXY() of instantiated THREE.SphereGeometry), passing it arguments calculated from t object which contains 2 points, "repeat" and "offset", offset.x being different depending on "side" value in element attr "eq-sphere"; this causes left/right side of SBS video frame to be used for texturing respective sphere
aframe starts rendering loop

6DoF VR videos

Currently most impressive thing I've seen is Light fields demo from Google team https://augmentedperception.github.io/deepviewvideo/ , they also released demo app in Steam with static 6DoF images https://store.steampowered.com/app/771310/Welcome_to_Light_Fields/

There's also Pseudoscience VR player popular in reddit 6DoF community, which seems to "guess" depth for 3DoF videos and hidden parts of objects.

Games and sims

DCS https://forum.dcs.world/topic/318004-dcs-now-supports-openxr-natively/ seems to have problems with jittery terrain movement in multithreaded mode
X-Plane
MSFS
Alyx. Haha it has NoVR mod now https://github.com/gb2dev/HLA-NoVR
Half-Life 2 https://store.steampowered.com/app/658920/HalfLife_2_VR_Mod/
Doom 3 (2004) is one of first 3D FPS games which felt "quite real", there are VR developments by community ("DOOM-3-BFG-VR" aka "Fully Possessed" mod https://github.com/NPi2Loup/DOOM-3-BFG-VR https://www.reddit.com/r/virtualreality/comments/180dxgn/doom_3_bfg_pcvr_new_update/ , RBDOOM-3-BFG fork https://github.com/Codes4Fun/RBDOOM-3-BFG , both seem to target OpenVR API), and modern flagship phones hardware is probably powerful enough to render it in VR, there's some GLES port https://github.com/glKarin/com.n0n3m4.diii4a https://github.com/glKarin/com.n0n3m4.diii4a/tree/master/Q3E/src/main/jni/doom3bfg (also based on RBDOOM-3-BFG, which seems to be "main" community port of 2012 Doom 3 BFG Edition "remaster" engine, as opposed to dhewm3 which is "main" community port of original 2004 Doom 3 engine); perhaps it would be interesting task to bring it to phone VR when Monado for Android matures

Physical-physiological FAQ

when I look at point in space in VR, where do my eyes actually look? Is it identical to looking at same point if VR scene was recreated IRL? Is light from that point identical? VR is not true holography/light field reconstruction. Physically eye "looking at something" means it turns in certain direction and accomodates its lens curvature for certain distance to catch all light coming from a point at that distance into matching single point on retina (producing sharpest image of respective object). VR HMDs are currently "unifocal", meaning that HMD lens are transforming light from displays in such way that it's effectively same as light from bigger displays placed at fixed distance ~2-3m from eyes (not collimated/"from infinity" as some think, this choice of distance probably comes from expectation about at how far objects will people most often look in VR, or maybe from it matching some "average" accomodation curvature). Feeling of depth mostly comes from stereoscopic parralax differences, but accomodation also matters, though not that much for most people, and beyond 6m it's almost same, "infinity focus". In VR most peoples brains will quickly "learn" to always keep focus on that ~2-3m "virtual image plane" which gives sharpest image of any object no matter how close or far it is as perceived via stereoscopic parralax convergence mechanism, but it will feel as slight discomfort and "something is different", known as VAC (vergence-accomodation conflict). There are attempts to create "varifocal" HMDs, but nothing is really available yet (see Creal).
how important is correspondence of interpupilary distance of persons eyes, HMD optics lens and rendering/filming cameras? How to ensure that projections are placed correctly against lens and eyes in Cardboard VR? average adult IPD is 63mm, 95% are between 56 and 70mm.
which resolution should be enough to be unable to see pixels/be able to see smallest detail like IRL? It is considered that people with "normal" vision have minimal separable andgle of 1/60th of degree (barely able to differentiante two light sources separated by this angle). This means 60 px/deg density is borderline for such people to not be able to see pixels even if they try (almost same 58 px/deg had been targeted by Apple when they had pushed for "retina" non-VR displays, with "normal" vieweing distance). Of course vision of typical gadget addict is less acute:) Density of 60 px/deg would mean 5400x5400 per-eye resolution for 90deg FOV (regular HMDs) and 21600x10800 for 180deg FOV (VR180 video half-frame); currently highest common VR video resolutions are 8K, i. e. 8192x4096p stereo frame of SBS VR180 video (22 px/deg), and most popular HMDs have ~90deg FOV and 2K per-eye resolutions (same 22 px/deg)
how VR videos have sharp detail of both close and far away objects? Deep focus/large depth of field.