Entries tagged as 'vision'

This page lists all postings that have been tagged with the chosen tag.

The NUIGroup Google Summer of Code students (I was lucky enough to become one of them for PyMT this year) are asked to summarize their weekly activities in blog format. Given that the first week has passed I figured I should just quickly outline what I have been working on up to now.

My proposal aims at developing more advanced text input methods for PyMT.

Work on PyMT

Some of the ideas I will realize draw heavily upon spelling correction and suggestion. It is therefore necessary that PyMT can interact with a spelling backend. Given that PyMT should be kept modular, I first implemented an abstract new core provider for spelling suggestions to become independent of a specific library. I then realized two concrete implementations of this provider:

  • An enchant spelling backend. This uses the enchant spelling library which can itself be used with different kinds of dictionaries.
  • A spelling backend based on OSX’s AppKit spell checker.

After the foundation was laid out I adapted a virtual keyboard with spelling support that Mathieu once developed to the new API and added it to the code base. All of this is not yet finished and needs some more love before I can merge it back into the master branch. You can check the branch I’m currently working on here.

PyMT Virtual Keyboard with spell checking

Work on Movid

While spellchecking is important for some of my upcoming widgets, some other text input approaches make use of additional information provided by the tracking application. For example, one idea I had was to split the keyboard in half and dedicate one half to each hand. The halves would then automatically orient themselves following the respective hand’s position and orientation. Theoretically, further information such as properties of the user’s hands (length of fingers, etc.) could be taken into account to lay out the keyboards. For this I obviously need some kind of hand and fingertip tracking. Luckily I implemented that for Movid already:

Movid Hand Tracking

However, since Movid is still not ready for end users due to a missing calibration utility and a proper (generic!) blob tracker (which means I can’t use it yet either), I continued my work on both of those. Again, both of which are not finished, but I can see the light at the end of the tunnel (or rather, the light below my fingers):

Movid Calibration Prototype

I hope that we can finish all of this and push out a first version of Movid for end users soon. And obviously, I want to test my text input widgets on my multitouch table and not in the mouse simulator.

This concludes my work for week one. If you have any questions or are interested in PyMT or Movid, feel free to join our IRC channel at #pymt and #movid on irc.freenode.net.

3 comments May 31, 2010 1:01:00 AM c++, coding, gsoc, hci, movid, multi-touch, nerdstuff, opensource, planet-pymt, planet-ubuntu, pymt, technology, vision

Hi everyone, I am glad to announce the birth of the Movid project: movid.org

Movid is an acronym; it stands for ‘Modular Open Vision Interaction Daemon’. It’s a cross-platform and Open Source vision tracker, designed to be as modular as possible. Although the project is pretty young, it already features more than 20 modules, including blob and fiducial trackers as well as TUIO output. Movid is coded in C++, and use WOscLIB, cJSON, libevent, libfidtrack, jpeg-8 and XgetOpt.

Movid has several key characteristics:

  • Cross-platform: It works under Windows, Linux and MacOSX.
  • Daemon: You can run the program without a GUI and control it from another computer over the network.
  • Threading: Each module can be run inside a thread. This means that you can finally fully utilize your multi-core processor!
  • Remote API: The daemon can be controlled with a JSON API. This also means that you can write your own GUI, e.g. in Flash, and the daemon can be controlled from any application that can make http requests!
  • Full HTML5 embedded administration: By default, the daemon acts as a HTTP server. You can control and modify the tracking pipeline in real-time and adjust many parameters.
  • Image streaming: Most modules process images. For your application or GUI, you can get the output image via a stream. So your applications can show any image from the piepline or use it for advanced features
  • Flexible pipeline: Unlike other applications, Movid allows you to fine-tune your image processing pipeline if you are an expert. You can create new pipelines, add modules/filters and change their parameters in real time.

However, Movid is not ready for users yet, since we are missing a few modules, like calibration. Right now, we are searching developers to support us with the further development.

More info:

7 comments Apr 19, 2010 10:31:00 PM hci, movid, multi-touch, nerdstuff, opensource, planet-pymt, planet-ubuntu, technology, vision