Archive for the 'Java2D' Category

Escher peers progress

Thursday, July 19th, 2007

Yesterday I did the antialiasing, which is needed especially for TrueType font rendering to look nice. Today I fought a little with window manager hints. I found it a little surprising that there’s apparently no standard way to open undecorated windows. Most toolkits seem to rely on Motif proprietary hints. There’s the other two options to make a Window ‘override-redirect’, which creates a Window without decorations, but also one that sticks in front of all other windows, or you can use one of the logical hints in the Extended Window Manager Hints spec, in this case I declare the window as TOOLBAR, which isn’t semantically correct, but does the job, at least under GNOME’s window manager.

Anyway, all this makes Swing apps look much better now (except for icons):

And it’s even quite snappy, considering that almost everything is done in Java.

BTW, I consider to attend the X developer summit 2007, and probably do a presentation about Escher. Not sure if I can make it though.

Today

Friday, July 13th, 2007

.. I got linked on the java.net frontpage. Wow. And somebody has linked a whole bunch of older posts into community.java.net/openjdk and even bother to make a small icon for me. Am I famous now? ;-) Ok, seriously. I enjoyed all those good comments, as well as one really moron-ish^Whumorous comment (duh. I have a real life, man! but I really shouldn’t answer to such stupidity.). The general consensus seems to be that an open Mercurial repo would help alot. That is my opinion too. I’m sure this is beeing worked on. From this point it has to be seen if it is worth to setup some processes to make code flowing from the community to OpenJDK less painful.

(BTW, the best comment I received offline via private email from Christopher Oliver - sorry Chris I gotta quote that - “Perhaps, if you worked for Sun that would remove a lot of those stumbling blocks. ” Hehe, right. That would probably help me, but not the community at large. But yeah, this is a little out of context and I probably shouldn’t quote from private email anyway.)

Back to real life, today I hacked and cleaned up Escher’s image handling (still no idea about XYPixmaps though) and implemented VolatileImage and DataBuffer on Pixmap and ZPixmap (server vs. client side images in X, respectively). This is an important step forward for the Escher peers. Now I only need to think up some cool tricks to do that for BITMASK and TRANSLUCENT VolatileImages and BufferedImages too, which ain’t that easy because (plain) X doesn’t support translucency for itself.

Improving Anti-Aliasing

Saturday, May 26th, 2007

I was able to improve the antialiasing rasterizer significantly. So far I calculated the coverage on ‘intersection pixels’ by scanning each scanline 4 times (or any other power of 2 for that matter) and counting the intersections within each pixels. The coverage (delta) for each pixel is then calculated as count / max. For example, if we count 4 intersections in a pixel when we did 8 scans, then we have a coverage delta of 0.5 for that pixel.

So far so good. But it quickly turned out that this approach is flawed. Imagine for example a straight vertical line right through the middle of a pixel. The above calculation would result in the maximum number of intersections, and thus the coverage delta is 1.0 rather than the expected 0.5. Visually this looked like vertical edges are ‘cut off’, in contrast to the horizontal edges which are smooth.

Luckily this problem can be solved without any penalty. I only needed to account for the actual intersection point inside the pixel. So now I not only count the intersections, but also add up the ‘fractions’ that the intersections create. I can increase the accuracy by a factor of 16 (or more, but I reserve only 4 binary digits accuracy for the ‘horizonal’ coverage, and leave up to another 4 digits for the ‘vertical’ coverage). Of course, the actual math behind all that is slightly more complex than this, but you get the idea.

This makes the rasterizer good enough for rendering arbitrary shapes smoothly. Even TrueType glyphs can be rendered nicely with it. The following screenshot uses 4x scanning only with 16x horizontal resolution.

I proposed this rasterizer as possible replacement for the encumbered graphics rasterizer in OpenJDK. Let’s see if it turns out to be feasible. If yes, then it’s going to be interesting to solve the legal issues around it. I think the worst (and likely) case would be to do the copyright grantback dance with the FSF and provide the code to OpenJDK via the SCA. If not it’s going to be used anyway in the Escher AWT peers, in the JamaicaVM graphics backends and font rasterizer and hopefully soon in the Odonata toolkit (which has been proposed as new OpenJDK project, which is an exciting idea for itself).

I’m off now for two weeks, celebrating my birthday tomorrow and my marriage in a week, hurray! Must be my lucky year. Absolutely incredible so far. :-D

Efficient Anti-Aliasing

Friday, May 18th, 2007

These days I did what I wanted to do for a long time, that is implement an efficient anti-aliasing for the rasterizer of the Escher peers (this rasterizer can be used in arbitrary other scenarios as well though). This was mainly triggered by OpenJDK’s need for a replacement for their encumbered rasterizer, and I hope that mine is probably a good fit. Let me outline how it works.

Before doing anti-aliasing, let me show how shapes are generally rendered to a raster in general:

Rendering a glyph

Basically, we scan the shape line by line, and at the intersection points we ‘turn on’ or ‘turn off’ the pixel at that intersection point. In order to do this efficiently, we first sort the edges of the shape according to their Y and X coordinates.

Ok, but what about anti-aliasing? Basically it’s the same, only that we scan each line more than once. Usually each line is scanned a -power-of-2 times, for example 8. Depending on how often we intersect the shape in a pixel, we can calculate a coverage value. This coverage value is then used as alpha value when filling the actual pixel.

The interesting part here is how the coverage values are managed.  A simple and naive approach is of course to store them in an array, one entry for each pixel. Now if we scan a line we proceed just like in non-AA rasterization and turn on (add 1) to the entry in this array that corresponds to the pixel when we ‘enter’ the shape, do the same for all pixel entries in between until we ‘leave’ the shape for that scanline. This is repeated for the same pixel line 2^N times. In the end we have an array that has the correct coverage values for each pixel.

Of course, iterating over the array and adding up the coverage entries isn’t exactly efficient. A better idea is to only store deltas. So, instead of adding 1 to all the pixel entries ‘inside’ the shape, we only add 1 at the entry pixel and substract 1 at the exit pixel. This is much better.

We must still iterate over the whole array though when we want to display it to the destination surface, to find consecutive chunks of equal pixels. Imagine that for a simple 600 pixels wide rectangle. Pretty bad. So what I did today and the last couple of days is design a datastructure that manages the coverage values in a much more efficient way. Basically, it is a simple linked list of deltas together with their X coordinates. This list is always sorted according to the X coordinates. Each time we scan a line, we insert one entry into that list for each intersection point. In order to find the insertion point, we do a linear search. A linear search? Haven’t we all learned that this is pretty inefficient. Yes. But here we make use of the fact that the scanline is always scanned in a linear fashion, from left to right, and that the list is always sorted. The data structure has a pointer to the last item that got inserted, and starting from there, it is not possible (or at least highly unlikely) that the next insertion point is more than 2 items away. When the next scanline inside the same Y pixel row is scanned, we ‘rewind’ the data structure to point to the first item again, and start searching there. Finally, when doing the actual rendering, we can simply iterate this list, and fill complete segments in one go. This can be quite efficiently implemented on most backends (for example, for a framebuffer it’s basically a simple memset operation when the color is opaque).

A nice side effect of this technique is that non-anti-aliased rendering can now be handled as a special case of anti-aliased rendering, with no performance penalty. Another little detail of the implementation is that the datastructure doesn’t throw away and re-allocate its entries, but keeps unused items for later reuse. This way, the scanline converter usually needs allocation, after some initial setup of its tables and structures.

Device Driver badness

Wednesday, February 7th, 2007

Today I implemented the VolatileImage subclass for WindML. The idea was to hold the image in the Video RAM, which should have the following advantages:

  1. Drawing into the image is much more efficient.
  2. Drawing (blitting really) the image onto the screen is much more efficient.

A first prototype used a normal bitmap in the RAM (not yet video RAM) and I was quite pleased that this worked after 2 hours work or so. This was equivalent to the offscreen image implementation that we had before in Jamaica (which was AWT 1.1 oriented and thus was a simple subclass of Image). However, I was really surprised when I changed some lines of code so that the VolatileImage was allocated in the Video RAM. The performance got much slower. It even got so slow, that you could see the image beeing copied onto the screen. So what was going on? Obviously the source and destination bitmaps (the onscreen and offscreen bitmaps respectivly) are placed inside the video RAM so you would suspect that blitting one on the other would be blazingly fast. Well, the only reason for this enourmous slowdown could only be that it is not the graphics chip performing the blit operation, but the CPU. When the CPU does something like this, it would have to pull the bytes from the source bitmap over the bus, and push them back to the destination bitmap, over the bus again. This is quite a mess on the bus going on I guess ;-)

I very much suspect that there is a problem with either with my understanding of the WindML API or with the device driver. The former seems quite unlikely, what use is a bitmap that can be allocated in the video RAM, when you can’t efficiently copy it to the onscreen bitmap? The latter seems more likely. I can guess that the driver supports something like real double buffering (as would be mapped into Java with a BufferStrategy), where a complete offscreen buffer is either copied or flipped. After all, the WindML offers a special API for that even. Supporting double buffering with a VolatileImage requires supporting to blit sub-images from offscreen to onscreen, which is probably slightly more difficult to implement because it would have to be done in h chunks of w pixels each. I’m going to test this on another box with different hardware and/or driver to confirm this or disprove this thesis.

If the API and/or driver would be Free Software I could quickly take a look for myself (and probably hack up the driver to support what I want), but oh well. The crux of proprietary software I guess…

Evolution Spam Filtering, BufferedImage again

Thursday, February 1st, 2007

Dear lazyweb. I am wondering how the spam filtering in evolution is supposed to work, as for me, it doesn’t. I have patiently marked all incoming spam as spam and non-spam as non-spam (which was easy, as never any single email got sorted out as spam so far). I am using two IMAP mailboxes. Does the spam filter not work on those? After all I think it would have to download all the mail body in order to filter it. But then again, I think evolution should tell me so, and not let me sort all my emails for weeks without any filtering. So, does spam filtering in evolution only work on POP or local boxes? Do I have to tweak some configuration option? Do I need any additional software besides spamassassin?

On a completely different note, I was slightly wrong with my last picture of the inner workings of BufferedImage. Dmitri from Sun’s Java2D team was so kind to explain in detail how this is supposed to work and how it works in their implementation (thanks Dmitri). Having two buffers and syncing them is supposedly not the way to go. My intention with this has been that we would have only one rendering engine doing all the rendering, but this is probably wishful thinking. If I understood Dmitri correctly there are (at least) 3 renderers for Java2D in Sun’s impl, one HW accelerated for Windows and VolatileImages, and 2 software renderers for BufferedImages (one generic and one with specialized impls for the known formats).

For Classpath this means that we should drop the two buffer scheme, let Cairo do what it can do (basically RGB and ARGB) and do the rendering on the remaining formats through the Java based renderer (which is suprisingly effective after my tweaking, but of course by far not as effective as a C based renderer could be, because in C I can access memory much faster without bounds checking etc). Hopefully soon we are able to utilize Sun’s implementations for the BufferedImage. Can’t wait to dig into this :-)

A better comparison

Wednesday, January 3rd, 2007

I did a run of GNU Classpath’s AWT and J2D benchmarks. The results are more fair now (JDK6 vs Cairo vs X/Escher). Cairo now beats the X peers, but the overall speed is still impressive given that it’s done all in java. I guess my own small benchmark was a nuance too artificial ;-)

Need for speed

Tuesday, January 2nd, 2007

Some more numbers for you as followup to my last measurements:

  • jdk6: ~750ms
  • my latest improved Escher/Java peers on Cacao: ~200ms

I wonder what Sun did to the shape renderer in jdk6. It’s quite slow compared to JDK5 and even much slower than the Escher peers. And that is without antialiasing just like the Java renderer. Probably this has to do with their work on the text renderer (which is even slower - ~1600ms - as it uses AA by default now).

It would be interesting to see how fast the Escher peers would run on hotspot. I’m thinking that a significant part of the performance gap between jdk5 and the escher peers is due to the performance of the VM itself as the escher peers and the shape renderer run comletely in java. Unfortunately it’s not trivial to port the peers to Sun’s JDK due to the few differences on the peer API. I’m still hoping that somebody will get hotspot running with GNU Classpath.

Raw Power

Saturday, December 30th, 2006

I rewrote the java based rasterizer that is used in the X peers of GNU Classpath. Heres some interesting numbers. Rendering a relatively complex shape (actually, the outline of a glyph vector) 250 times takes:

  • ~60ms on JDK1.5
  • ~780 ms on Cacao with the Cairo peers
  • ~350 ms on Cacao with the X peers and the new Rasterizer

Learned lesson: native code is not necessarily faster than java code. The whole rendering pipeline of the X peers is implemented in plain java. X only draws horizontal lines here.

The comparison with the JDK is not necessarily fair but gives an interesting impression what should be possible.

Find the test program here. Yes it is a microbenchmark that highlights one particular case (rendering complex shapes). I will make more interesting and comprehensive benchmarks as soon as I am able to run the Java2D and AWT benchmarks that ship with GNU Classpath.

faster faster faster!

Monday, July 10th, 2006

After some more tweaking in Escher and Twisti implementing a shiny new feature in CacaoVM (disabling assertion checking with -dsa/-ese,THANKS Twisti!) the Escher graphics performance could be nearly doubled again, coming closer to the JDK performance. Lesson learned: when putting asserts in performance critical code, make sure you disable them when doing real stuff. And never do assert read() == bla, or something similar, you can’t be sure that the read() is actually executed. Find updated useless stats here. For the sake of at least some fairness I also updated the Cairo results to account for the new CacaoVM. It doesn’t have such a great impact there though.