Archive for the 'Aicas' Category

These days

Wednesday, June 13th, 2007

.. I’m working on a couple of things in parallel:

On one front I’m trying to provide a replacement for the graphics rasterizer in OpenJDK. As far as I can see to this point, the rasterizer I implemented for GNU Classpath should be fine for a start. I offered this to OpenJDK, but there seem to be a couple of legal and organizational hurdles to take. I am discussing how to best solve these issues and move forward with the code. It looks like going through the copyright grantback procedure with the FSF for the code in question is hard to avoid. In the meantime, Jim from the Java2D team has factored out a preliminary interface for the rasterizer, and Mark Reinhold offered to provide early read-only VCS access to the Java2D workspace. Thanks!

Then I am working on integrating the OpenJDK class library into the JamaicaVM. It turned out that the package-after-package approach is much more painful than expected, due to massivly intertwined dependencies. I even spotted a couple of places where private methods in ‘alien’ packages are called via reflection (urgs). I wonder how that went through QA, and I wonder what happens when one such method changes… Anyway, with the release of IcedTea I switched to importing all classes at once, with the help of the IcedTea classes this was surprisingly painless. Now I’m working with my collegues to get the native things done. So far we found that they even have some kind of VM/native interface in place, at least for the classes that I looked at so far. Basically, platform dependend syscalls are abstracted out as JVM_* functions. I hope to get a HelloWorld running this week.

Related to the OpenJDK/JamaicaVM porting work, I am working to get the Classpath GTK peers, and the other AWT backends in Jamaica to work with the JDK. The biggest problem here seem to be the differences in Image and Font handling, which are abstracted out in a backend-independent way in GNU Classpath, but seem to be implemented rather monolithically in OpenJDK. Dunno how to solve that. It might be possible to use the OpenJDK implementations when the encumbered pieces in the font and graphics rasterizer are solved. So far I think we have to go with a bunch of GNU Classpath classes in java.awt.***.

Improving Anti-Aliasing

Saturday, May 26th, 2007

I was able to improve the antialiasing rasterizer significantly. So far I calculated the coverage on ‘intersection pixels’ by scanning each scanline 4 times (or any other power of 2 for that matter) and counting the intersections within each pixels. The coverage (delta) for each pixel is then calculated as count / max. For example, if we count 4 intersections in a pixel when we did 8 scans, then we have a coverage delta of 0.5 for that pixel.

So far so good. But it quickly turned out that this approach is flawed. Imagine for example a straight vertical line right through the middle of a pixel. The above calculation would result in the maximum number of intersections, and thus the coverage delta is 1.0 rather than the expected 0.5. Visually this looked like vertical edges are ‘cut off’, in contrast to the horizontal edges which are smooth.

Luckily this problem can be solved without any penalty. I only needed to account for the actual intersection point inside the pixel. So now I not only count the intersections, but also add up the ‘fractions’ that the intersections create. I can increase the accuracy by a factor of 16 (or more, but I reserve only 4 binary digits accuracy for the ‘horizonal’ coverage, and leave up to another 4 digits for the ‘vertical’ coverage). Of course, the actual math behind all that is slightly more complex than this, but you get the idea.

This makes the rasterizer good enough for rendering arbitrary shapes smoothly. Even TrueType glyphs can be rendered nicely with it. The following screenshot uses 4x scanning only with 16x horizontal resolution.

I proposed this rasterizer as possible replacement for the encumbered graphics rasterizer in OpenJDK. Let’s see if it turns out to be feasible. If yes, then it’s going to be interesting to solve the legal issues around it. I think the worst (and likely) case would be to do the copyright grantback dance with the FSF and provide the code to OpenJDK via the SCA. If not it’s going to be used anyway in the Escher AWT peers, in the JamaicaVM graphics backends and font rasterizer and hopefully soon in the Odonata toolkit (which has been proposed as new OpenJDK project, which is an exciting idea for itself).

I’m off now for two weeks, celebrating my birthday tomorrow and my marriage in a week, hurray! Must be my lucky year. Absolutely incredible so far. :-D

Efficient Anti-Aliasing

Friday, May 18th, 2007

These days I did what I wanted to do for a long time, that is implement an efficient anti-aliasing for the rasterizer of the Escher peers (this rasterizer can be used in arbitrary other scenarios as well though). This was mainly triggered by OpenJDK’s need for a replacement for their encumbered rasterizer, and I hope that mine is probably a good fit. Let me outline how it works.

Before doing anti-aliasing, let me show how shapes are generally rendered to a raster in general:

Rendering a glyph

Basically, we scan the shape line by line, and at the intersection points we ‘turn on’ or ‘turn off’ the pixel at that intersection point. In order to do this efficiently, we first sort the edges of the shape according to their Y and X coordinates.

Ok, but what about anti-aliasing? Basically it’s the same, only that we scan each line more than once. Usually each line is scanned a -power-of-2 times, for example 8. Depending on how often we intersect the shape in a pixel, we can calculate a coverage value. This coverage value is then used as alpha value when filling the actual pixel.

The interesting part here is how the coverage values are managed.  A simple and naive approach is of course to store them in an array, one entry for each pixel. Now if we scan a line we proceed just like in non-AA rasterization and turn on (add 1) to the entry in this array that corresponds to the pixel when we ‘enter’ the shape, do the same for all pixel entries in between until we ‘leave’ the shape for that scanline. This is repeated for the same pixel line 2^N times. In the end we have an array that has the correct coverage values for each pixel.

Of course, iterating over the array and adding up the coverage entries isn’t exactly efficient. A better idea is to only store deltas. So, instead of adding 1 to all the pixel entries ‘inside’ the shape, we only add 1 at the entry pixel and substract 1 at the exit pixel. This is much better.

We must still iterate over the whole array though when we want to display it to the destination surface, to find consecutive chunks of equal pixels. Imagine that for a simple 600 pixels wide rectangle. Pretty bad. So what I did today and the last couple of days is design a datastructure that manages the coverage values in a much more efficient way. Basically, it is a simple linked list of deltas together with their X coordinates. This list is always sorted according to the X coordinates. Each time we scan a line, we insert one entry into that list for each intersection point. In order to find the insertion point, we do a linear search. A linear search? Haven’t we all learned that this is pretty inefficient. Yes. But here we make use of the fact that the scanline is always scanned in a linear fashion, from left to right, and that the list is always sorted. The data structure has a pointer to the last item that got inserted, and starting from there, it is not possible (or at least highly unlikely) that the next insertion point is more than 2 items away. When the next scanline inside the same Y pixel row is scanned, we ‘rewind’ the data structure to point to the first item again, and start searching there. Finally, when doing the actual rendering, we can simply iterate this list, and fill complete segments in one go. This can be quite efficiently implemented on most backends (for example, for a framebuffer it’s basically a simple memset operation when the color is opaque).

A nice side effect of this technique is that non-anti-aliased rendering can now be handled as a special case of anti-aliased rendering, with no performance penalty. Another little detail of the implementation is that the datastructure doesn’t throw away and re-allocate its entries, but keeps unused items for later reuse. This way, the scanline converter usually needs allocation, after some initial setup of its tables and structures.

This week

Friday, March 9th, 2007

.. so far:

  • I’ve finally managed to get a JOGL Demo running AOT-compiled on JamaicaVM. Performance is now comparable to Sun’s JDK, with the added benefit of realtime GC behaviour. Yay!
  • I wrote quite a lot on my thesis. I’m pretty good in my time plan and am quite certain to have it finished on april 1st.
  • My girlfriend and I were watching a TV movie (we very rarely do this, only a couple of times a year), about WW2 refugees from eastern preussian, which was especially interesting to me, because my grandparents were also from there. It’s hard for me to imagine what they must have gone through, walking >1000km through a freaking cold winter, beeing in constant danger of beeing attacked by both the red army and even the german wehrmacht, hunger, rape and all that. I wish they’d still be alive so they could tell me first hand. I really should get up and write my aunt in Berlin, who is the last still-alive relative who was there.
  • I learned about F3. Which is really an interesting project. Or at least, will be when it’s going public.
  • A couple of interesting things have been moving regarding my future job situation. But I won’t tell you.. ;-)

JOGL on Jamaica

Thursday, March 1st, 2007

I’m excited to have a first Jogl demo working on the JamaicaVM. That was a bit of a fight though, JOGL seems quite demanding on a couple of edges. I had to fix dynamic library loading (not something that Jamaica was designed for), implement the direct byte buffer JNI functions (missing so far), fix some issues with GNU Classpath’s AWT (I’m checking this in soon) as well as the GTK peers (basically getting rid of the NSA stuff in there), fix a couple of more issues in the VM (especially that super large GL interface and the impementing class caused some headaches, or should I say segfaults…). Here’s the obligatory screenshot (nothing fancy, but still). Thanks go to Torsten, Fridjof and Ingo for helping here and there.

Ah, and I must add that JOGL 1.1 is a little stupid. It get’s hold of some Sun internal JDK class (I guess) via reflection to access attributes of the X implementation. That’s pretty stupid on a non-Sun VM. Especially since there is JAWT for things like that. I should really file a bug before they ship that crap.

JOGL on JamaicaVM

Odds and Ends

Wednesday, January 31st, 2007
  • Today I finally grokked how the Java2D Image API could be implemented. Basically the biggest problem (to me) always has been the BufferedImage constructor, which allowed BIes of all kinds to be created. What about image formats that are not supported by the native rendering engine? How are we supposed to render into that? Well, the solution would be to have 2 buffers like:
    RE --> nBuffer  < -- > jBuffer <-- Java2D API

    where the native buffer has a format that is supported by the RE and the java buffer (I call it so, it is not required to have that in Java, could be any memory area thanks to the fine DataBuffer abstraction) holds the data in the requested ColorModel/SampleModel. Well, the obvious problem here is when to sync the data and what to sync. Fortunately (that’s what I discovered today), BufferedImage implements WritableRenderedImage, which provides a couple of methods to grab the buffer for writing and release it. This is find, as it provides very nice syncpoints.One remaining obstactle is when the RE does not support a super-format (I call it like this. For instance, ARGB8888 is a superformat of RGB565) of the requested format. For instance, when the RE only supports ARGB1888 (which is the case for one system I am currently implementing this API for) but the app requires ARGB8888. How is the RE supposed to render into that. The API is a little weak in this respect as it doesn’t allow to reject the request. The only real solution would be to have a Java only renderer do the rendering, but this is very ugly for several reasons: The rendering would be noticably different and noticably slow (well the latter isn’t that bad. what do you expect when you request something not supported by the HW). Any comments and ideas on this are very appreciated.All in all, nowadays I feel about Java2D almost like I feel about NIO: Quite confusing for the start, but once things are sorted out and have found their place in the mind-picture, it seems relativly well thought out and nice. Only problem with Java2D probably is that it carries quite some cruft from the <= AWT1.1 days, which makes it even more confusing.

  • Today a collegue pointed me to this, makes my mind boggle. I copy it into here as a riddle to solve. So please make up your mind and find out what this is supposed to do. And yes, this is legal C code compilable by GCC. :-)
    switch ( count % 8 )  /* count > 0 assumed */
    {
    case 0:        do {  *to++ = *from++;
    case 7:              *to++ = *from++;
    case 6:              *to++ = *from++;
    case 5:              *to++ = *from++;
    case 4:              *to++ = *from++;
    case 3:              *to++ = *from++;
    case 2:              *to++ = *from++;
    case 1:              *to++ = *from++;
    } while (( count -= 8 ) > 0);
    }
  • Found an interesting JSR: Application Framework for Swing Gotta have a deeper look at it ASAP.

Happy interpreter hacking

Sunday, January 28th, 2007

The last couple of weeks I got my hands dirty (really dirty) with VM hacking, namely the Java interpreter of the JamaicaVM. I’m applying a couple of well known optimization techniques (besides loads of cleanup). The most promising so far (I’ve done only prototype-like implementations so far) are:

  • Superinstructions. Combines frequent ‘cheap’ instruction sequences into bigger units. For instance, in order to add two values, a common sequence is iload* iload* add istore*. Of course, there are variations (like only one load or without store and many more). Such a sequence would be combined into a new opcode which would do all this without accessing the operand stack. The net effect is that the overhead of moving stuff onto and from the stack is saved as well as the dispatch overhead between these instructions (which is quite significant as it is at least as expensive as the operations themselves in these cheap cases). Theoretically it would be possible to eliminate operand stack completely (this is called stack folding and implemented in one of the java processors iirc — quite amazing stuff), but this would unnecessarily bloat up the binary. Instead I’m doing some statistics to find the most frequent patterns and implement these.
  • Threaded dispatching. So far the interpreter has been implemented in a straightforward while-switch fashion. This creates a certain overhead, because we have to jump into the bytecodehandler, out of the handler and back to the loop condition. That’s pretty bad, especially considering that many opcodes are really cheap, and it tends to intercept the processor pipeline badly. The better way to do this is to jump directly to the next bytecode handler at the end of the previous one. That’s called threaded dispatching. Unfortunately, you can’t really implement this with ANSI C. So I did quite some macro magic and utilized a GNU C specific feature (label pointers). The outcome was quite impressive, ~15% improved performance in one go. After having gone through all this I was told by Ian Rogers that GCC has a -funswitch-loop optimization flag which makes a threaded dispatcher out of a while-switch dispatcher. Gotta compare it to my homegrown implementation. I really hope that mine does better than GCC’s one. I don’t like wasted weekends very much ;-)

I’d like to implement some more intersting stuff, like direct dispatching or a primitive JIT, but that’d require to be able to get hold of some contiguous memory, which isn’t exactly easy in Jamaica (due to the hard realtime nature of the GC). Let’s see how this works out in the end.
On a related note, I’ve been digging into the Hotspot and JikesRVM sourcecode lately, mostly out of curiosity (until now). I’m quite excited by the idea of a VM implemented mostly in Java itself, even hosting itself. It requires some magic, but in my experience, beeing able to do things in Java without needing to touch C makes development more rapid and tends to produce better and more readable code. And if you think a VM implemented in Java itself must be slow, think again.