instantreality 1.0

Using VisionLib in X3DOM /w Remote Tracking example

VisionLib, InstantIO, X3DOM, Streaming, Tracking
Author(s): Daniel Tanneberg, Manuel Olbrich, Pavel Rojtberg
Date: 2014-07-31

Summary: This short tutorial will show you how to exchange images and matrices between VisionLib and X3DOM using websockets and use it for a remote tracking application as example.

Warning: InstantReality 2.4 is necessary for this to work. Later versions don't include the computer vision framework.


The purpose for connecting an application page to VisionLib is to outsource computational expensive operations to a server, e.g. for mobile applications. For example with a tracking application, the application page streams the webcam image to a server, the vision based tracking is computed at a server and just a matrix is sent back to the client, which then can be used to update the scene. Another setup would be to stream an image and pose from a server to a client, which causes less overhead and enables the use of industrial cameras with special interfaces.

Configure & start InstantIO server

Since VisionLib is only a computer vision library, it needs to be embedded in an application to get started and fed with data. InstantIO provides access to specialized hard and software for virtual and mixed reality applications, and is used in this tutorial to interface the VisionLib from our web application. To use VisionLib over websockets, the InstantIO server has to be configured and started with the following parameters. A Web node is needed with a given Port to have access to the server and a VisioLib node is needed with the application configuration file as parameter. Save the following code in a file instantio-config.xml and use it this way to start InstantIO using the console: "PATH_TO_INSTANT\InstantIO.exe" instantio-config.xml.

Code: InstantIO Configuration File

<?xml version="1.0" encoding="UTF-8"?>
<InstantIO versionMajor='1' versionMinor='0'>
  <Node label='Web' type='Web'>
    <Parameter name='Port' value='8888'/>
  <Node label='vl' type='VisionLib'>
    <Parameter name='ConfigFile' value=''/>

Receive image and matrix stream from server

Now that InstantIO is running correctly we start to create the client/application page. As a first simple start, the following example will show you how to connect through websockets to VisionLib and how to use an image and matrix stream received from the server as input to manipulate your local scene.

Prepare the websockets

In this example the webcam is accessed through VisionLib, which can be useful to save overhead or directly manipulate camera settings like focus, which is not possible through the browser. To create a connection through a websocket from your application page to VisionLib you need to create a websocket for each stream, i.e. images and matrices. The following code provides a small example and shows how to use the streams. The function start_image creates a websocket for the image stream, given the location of InstantIO and the name, which is the id of your image object which shows the received image on your application page. Each time a image is received, it's set as the source of the image object. Since the received images are base64 encoded, we need to add an header in front of the string, so our browser understands how to interpret the string. The function start_mat creates a websocket for the matrices stream, given the location of InstantIO and the name, which is the id of the viewpoint in your scene. Each time a matrix is received, the _viewMatrix of this viewpoint is set to the received matrix and you have to tell the runtime API to render a new frame.

Code: Image and Matrix receiving functions

function websocket(location)
    if (window.MozWebSocket)
        return new MozWebSocket(location);
        return new WebSocket(location);

function start_image(location, name)
    var socket_img = websocket(location);
    socket_img.onmessage = function (event)
        document.getElementById(name).src="data:image/jpeg;base64," +;

function start_mat(location, name)
    var socket_mat = websocket(location);
    socket_mat.onmessage = function(event)
        var mat = x3dom.fields.SFMatrix4f.identity();
        var vp = document.getElementById(name);
        vp._x3domNode._viewMatrix = mat;
        document.getElementById('x3dElement').runtime.canvas.doc.needRender = true;

Open the websockets

With the code above you just have to open the websockets and associate them with the specific objects on the application page and in the VisionLib node. InstantIO_address is the ip:port location of the running InstantIO server, e.g. "".

Code: Open the websockets

start_mat("ws://" + InstantIO_address + "/InstantIO/element/vl/InstantCamera_ModelView/data.string",'ID_OF_VIEWPOINT');
start_image("ws://" + InstantIO_address + "/InstantIO/element/vl/InstantVideo/data.string","ID_OF_IMAGE");

Now always when an image is received it's automatically set as the source of the image object on your application page and always when a matrix is received it's set as the _viewMatrix of the viewpoint of the X3D scene on your application page and a new frame is rendered.

Send image stream to server & receive matrix stream from server (Remote Tracking)

In this example the webcam image is captured by the web browser. The application page, the client, sends an image to the server and receives a matrix from the server to manipulate the local scene, i.e. this time it's a 'two-way communication' between client and server. With this setup, computation can be outsourced to an dedicated machine, while using a relative thin client. Receiving and using the matrix is almost the same as in the previous example, the main difference is that this time the client sends data to the server, i.e. the client sends the webcam image to the server and receives the matrix to manipulate its scene. The webcam is acquired via WebRTCs getUserMedia interface and is streamed to a video object and then to a canvas on the page from which it is streamed to the server. Receiving and using the matrix is only modified in comparison to the previous example, instead of modifying the viewpoint this time the matrixTransformation node is modified with the received matrix.

Code: Camera access, drawing and streaming functions

// get access to the camera and start drawing/streaming
function startSendCam() {
    streaming = false;
    navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia;

    if (navigator.getUserMedia) {
        var constraints = {video:true, audio:false};
                function (stream) {
                    // webcam to video
                    video = document.getElementById('videoID');
                    video.src = URL.createObjectURL(stream);

                    // stream the video to the canvas
                    canvas = document.getElementById('canvasID');
                    context = canvas.getContext('2d');
                    streaming = true;

                    drawVideo(video, context, canvas.width,canvas.height);
                function(stream) {console.log("error while trying to access the webcam."); console.log(stream);}

// update the picture and call send function
function drawVideo(v,c,w,h) {
    if(streaming) {

// send the picture to the server
function streamPicture() {
    if(streaming) {
        var c = document.getElementById('canvasID');
        var picture_data = c.toDataURL('image/jpeg');
        picture_data = picture_data.replace(/^data:image\/(png|jpeg);base64,/, "");
        if(sendsocket.bufferedAmount<picture_data.length*2) {
            send = true;

// open the send socket
function start_image_send(location) {
    sendsocket = websocket(location);
    sendsocket.onopen = function () {

To calibrate your scene with the camera and tracking, you have to override the viewpoint's getProcjetionMatrix function. Open another websocket to TrackedObject1Camera_Projection to receive the camera proejction matrix, save it and close the socket again. Now override the getProjectionMatrix function and just return the saved matrix.

Code: Override the viewpoint's getProjectionMatrix function

var pm_socket = websocket("ws://" + InstantIO_address + "/InstantIO/element/vl/TrackedObject1Camera_Projection/data.string");
pm_socket.onmessage = function(event) {
    MYAPP.projectionMatrix =;     
if(MYAPP.projectionMatrix != null)

var vp = document.getElementById('VIEWPOINT_ID');
vp._x3domNode.getProjectionMatrix = function() {
    var mat = x3dom.fields.SFMatrix4f.identity();
    return mat;

The modified matrix receive funtion uses the setAttribute function to set the new matrix value, thus you don't have to tell the runtime API to render a new frame manually.

Code: Modified matrix receive function

// modified matrix receive function
function start_mat(location) {
    socket_mat = websocket(location);

    socket_mat.onmessage = function(event)
        var mat = x3dom.fields.SFMatrix4f.identity();

        var mt = document.getElementById('MatrixTransform_ID');
        mt.setAttribute('matrix', mat.toGL().join(","));

Connect the websockets again with the specified objects in the VisionLib node.

Code: Modified opening of the websockets

// open the websocket to send an image and receive matrix
start_image_send("ws://" + InstantIO_address + "/InstantIO/element/vl/ImageRGB/data.string");
start_mat("ws://" + InstantIO_address + "/InstantIO/element/vl/TrackedObject1Camera_ModelView/data.string");


In the provided example a webcam image is streamed by the browser to a server, which computes a marker based tracking and returns the matrix needed to transform the object relative to the camera position. It's based on the code snippets above and the Marker Tracking tutorial.



This tutorial has no comments.

Add a new comment

Due to excessive spamming we have disabled the comment functionality for tutorials. Please use our forum to post any questions.