Archive for the 'Escher' Category

Caciocavallo fonts

Thursday, April 17th, 2008

Another day, another non-existent problem solved :-D Seriously, the font problems I’ve seen yesterday have been kindof bad luck. Wrong font has been chosen by the pipeline, one that doesn’t render smooth as bold font. Plus some small problems in OpenJDK and/or FreeType regarding rendering this font, etc. Today I tried the Font2DTest program of OpenJDK and most fonts look just fine:

This is Bitstream Vera plain without anti-aliasing. Pretty smooth. The font we’ve seen yesterday has been FreeSans bold:

Which also looks slightly crappy with OpenJDK’s default pipeline:

BTW, it’s really amazing how well Swing applications already work. There are only very few glitches (no mouse drag yet for example) and performance seems reasonable (not quite on par with the OpenJDK X11, but still). And all this with minimal effort on my side, I really only implemented a handful of primitives and a little infrastructure. Yay OpenJDK Java2D architecture!

Caciocavallo prototype

Wednesday, April 16th, 2008

Today I finally solved a non-existant problem, and can finally show some stuff from the OpenJDK challenge work:

The demo already works quite well and performance is reasonable. Of course, there are still many quirks and etches, and the fonts look like crap. Not sure yet, why the fonts look so ugly, I have to find this out. This is using OpenJDK rendering pipeline and does all the rendering in pure Java (over Escher).

A cup of tea

Wednesday, April 16th, 2008

As Mario already mentioned, there has been some Escher goodness during the last couple of days. Both of us have been working on fixing Escher’s GLX implementation (which became broken after my big overhaul of the protocol transport layer). Today I went through the large parameter stuff in GLX, which is so horribly weird, I think I have some gray hair now ;-) Makes you wanna put a // NEVER EVER CHANGE THIS CODE AGAIN in there. Anyway, this means that textures are working now, I think this is pretty cool and the last big broken thing:

A cup of Escher teaTextures in Escher

Phew. I think now we can start with the JOGL layer on top of Escher, which hopefully enables us to run Jake2
:-D.

Escher peers progress

Thursday, July 19th, 2007

Yesterday I did the antialiasing, which is needed especially for TrueType font rendering to look nice. Today I fought a little with window manager hints. I found it a little surprising that there’s apparently no standard way to open undecorated windows. Most toolkits seem to rely on Motif proprietary hints. There’s the other two options to make a Window ‘override-redirect’, which creates a Window without decorations, but also one that sticks in front of all other windows, or you can use one of the logical hints in the Extended Window Manager Hints spec, in this case I declare the window as TOOLBAR, which isn’t semantically correct, but does the job, at least under GNOME’s window manager.

Anyway, all this makes Swing apps look much better now (except for icons):

And it’s even quite snappy, considering that almost everything is done in Java.

BTW, I consider to attend the X developer summit 2007, and probably do a presentation about Escher. Not sure if I can make it though.

Today

Friday, July 13th, 2007

.. I got linked on the java.net frontpage. Wow. And somebody has linked a whole bunch of older posts into community.java.net/openjdk and even bother to make a small icon for me. Am I famous now? ;-) Ok, seriously. I enjoyed all those good comments, as well as one really moron-ish^Whumorous comment (duh. I have a real life, man! but I really shouldn’t answer to such stupidity.). The general consensus seems to be that an open Mercurial repo would help alot. That is my opinion too. I’m sure this is beeing worked on. From this point it has to be seen if it is worth to setup some processes to make code flowing from the community to OpenJDK less painful.

(BTW, the best comment I received offline via private email from Christopher Oliver - sorry Chris I gotta quote that - “Perhaps, if you worked for Sun that would remove a lot of those stumbling blocks. ” Hehe, right. That would probably help me, but not the community at large. But yeah, this is a little out of context and I probably shouldn’t quote from private email anyway.)

Back to real life, today I hacked and cleaned up Escher’s image handling (still no idea about XYPixmaps though) and implemented VolatileImage and DataBuffer on Pixmap and ZPixmap (server vs. client side images in X, respectively). This is an important step forward for the Escher peers. Now I only need to think up some cool tricks to do that for BITMASK and TRANSLUCENT VolatileImages and BufferedImages too, which ain’t that easy because (plain) X doesn’t support translucency for itself.

Improving Anti-Aliasing

Saturday, May 26th, 2007

I was able to improve the antialiasing rasterizer significantly. So far I calculated the coverage on ‘intersection pixels’ by scanning each scanline 4 times (or any other power of 2 for that matter) and counting the intersections within each pixels. The coverage (delta) for each pixel is then calculated as count / max. For example, if we count 4 intersections in a pixel when we did 8 scans, then we have a coverage delta of 0.5 for that pixel.

So far so good. But it quickly turned out that this approach is flawed. Imagine for example a straight vertical line right through the middle of a pixel. The above calculation would result in the maximum number of intersections, and thus the coverage delta is 1.0 rather than the expected 0.5. Visually this looked like vertical edges are ‘cut off’, in contrast to the horizontal edges which are smooth.

Luckily this problem can be solved without any penalty. I only needed to account for the actual intersection point inside the pixel. So now I not only count the intersections, but also add up the ‘fractions’ that the intersections create. I can increase the accuracy by a factor of 16 (or more, but I reserve only 4 binary digits accuracy for the ‘horizonal’ coverage, and leave up to another 4 digits for the ‘vertical’ coverage). Of course, the actual math behind all that is slightly more complex than this, but you get the idea.

This makes the rasterizer good enough for rendering arbitrary shapes smoothly. Even TrueType glyphs can be rendered nicely with it. The following screenshot uses 4x scanning only with 16x horizontal resolution.

I proposed this rasterizer as possible replacement for the encumbered graphics rasterizer in OpenJDK. Let’s see if it turns out to be feasible. If yes, then it’s going to be interesting to solve the legal issues around it. I think the worst (and likely) case would be to do the copyright grantback dance with the FSF and provide the code to OpenJDK via the SCA. If not it’s going to be used anyway in the Escher AWT peers, in the JamaicaVM graphics backends and font rasterizer and hopefully soon in the Odonata toolkit (which has been proposed as new OpenJDK project, which is an exciting idea for itself).

I’m off now for two weeks, celebrating my birthday tomorrow and my marriage in a week, hurray! Must be my lucky year. Absolutely incredible so far. :-D

Efficient Anti-Aliasing

Friday, May 18th, 2007

These days I did what I wanted to do for a long time, that is implement an efficient anti-aliasing for the rasterizer of the Escher peers (this rasterizer can be used in arbitrary other scenarios as well though). This was mainly triggered by OpenJDK’s need for a replacement for their encumbered rasterizer, and I hope that mine is probably a good fit. Let me outline how it works.

Before doing anti-aliasing, let me show how shapes are generally rendered to a raster in general:

Rendering a glyph

Basically, we scan the shape line by line, and at the intersection points we ‘turn on’ or ‘turn off’ the pixel at that intersection point. In order to do this efficiently, we first sort the edges of the shape according to their Y and X coordinates.

Ok, but what about anti-aliasing? Basically it’s the same, only that we scan each line more than once. Usually each line is scanned a -power-of-2 times, for example 8. Depending on how often we intersect the shape in a pixel, we can calculate a coverage value. This coverage value is then used as alpha value when filling the actual pixel.

The interesting part here is how the coverage values are managed.  A simple and naive approach is of course to store them in an array, one entry for each pixel. Now if we scan a line we proceed just like in non-AA rasterization and turn on (add 1) to the entry in this array that corresponds to the pixel when we ‘enter’ the shape, do the same for all pixel entries in between until we ‘leave’ the shape for that scanline. This is repeated for the same pixel line 2^N times. In the end we have an array that has the correct coverage values for each pixel.

Of course, iterating over the array and adding up the coverage entries isn’t exactly efficient. A better idea is to only store deltas. So, instead of adding 1 to all the pixel entries ‘inside’ the shape, we only add 1 at the entry pixel and substract 1 at the exit pixel. This is much better.

We must still iterate over the whole array though when we want to display it to the destination surface, to find consecutive chunks of equal pixels. Imagine that for a simple 600 pixels wide rectangle. Pretty bad. So what I did today and the last couple of days is design a datastructure that manages the coverage values in a much more efficient way. Basically, it is a simple linked list of deltas together with their X coordinates. This list is always sorted according to the X coordinates. Each time we scan a line, we insert one entry into that list for each intersection point. In order to find the insertion point, we do a linear search. A linear search? Haven’t we all learned that this is pretty inefficient. Yes. But here we make use of the fact that the scanline is always scanned in a linear fashion, from left to right, and that the list is always sorted. The data structure has a pointer to the last item that got inserted, and starting from there, it is not possible (or at least highly unlikely) that the next insertion point is more than 2 items away. When the next scanline inside the same Y pixel row is scanned, we ‘rewind’ the data structure to point to the first item again, and start searching there. Finally, when doing the actual rendering, we can simply iterate this list, and fill complete segments in one go. This can be quite efficiently implemented on most backends (for example, for a framebuffer it’s basically a simple memset operation when the color is opaque).

A nice side effect of this technique is that non-anti-aliased rendering can now be handled as a special case of anti-aliased rendering, with no performance penalty. Another little detail of the implementation is that the datastructure doesn’t throw away and re-allocate its entries, but keeps unused items for later reuse. This way, the scanline converter usually needs allocation, after some initial setup of its tables and structures.

Escher by numbers

Monday, May 7th, 2007

I did some preliminary and very simplistic benchmarks with the current development version of Escher vs. the released version 0.2.3.

The benchmark is doing 1,000,000 operations per run (e.g. 1,000,000 lines or rectangles) with random color and random coordinates.

This is measured using JDK6 Hotspot VM.

Test SVN Local SVN TCP 0.2.3 TCP
lines 6765 7938 9291
rects 6812 8274 9363
fill rects 58072 59625 113316

For comparison, the following is measured with JamVM:

Test SVN Local SVN TCP 0.2.3 TCP
lines 31992 35851 232868
rects 32240 36250 232809
fill rects 59171 59746 231549

Update : I’ve written an equivalen benchmark for AWT and did a test on JDK6:

lines 6240 ms
rects 6332 ms
fill rects 44512

That’s still slightly better than Escher, but pretty close already. With some more protocol enhancements and the usage of NIO, there is hope that Escher can catch up with JDK6 performance though.

Escheritis

Friday, May 4th, 2007

I almost finished the first step in my attempt to make Escher more efficient. This was by far the biggest part I supposed. I had to restructure almost all of the Escher classes to use the new protocol implementation. What I did was this: The old protocol implementation created a new Request object for each little request (unless, when multiple requests, like many lines one after the other, get collapsed), each with its own buffer. This buffer then got copied to the socket. You can imagine that creating and disposing buffers while doing heavy rendering (like many simple things like rectangles and lines) put quite a heavy load on the GC. The new implementation uses only one buffer (ok, two; one for input and one for output), and the protocol implementation writes into this one buffer. There is no (de-)allocation at all now. However, after the core of Escher has been adjusted this way, I needed to go through all classes and adjust every bit of Escher to this new implementation. This is now finished and SVN trunk now completely compiles again (it didn’t for a couple of months).

The next step is to shake out any bugs that sneaked in with this rewrite. After this, I might release a new version of Escher.

Things I am pondering to implement for further performance improvement are:

  • Remove synchronization. The idea is that programs that don’t do multithreading shouldn’t be punished by the synchronization. And, after all, applications usually can do this for themselves much more effectively (for example, locking based on ‘frames’ rather than for each graphics primitive). I am not sure if I should ditch the synchronization altogether, or if I should put in some lock() and unlock() methods, which are no-ops when no locking is required, or which do locking via java.util.concurrent.locks, if an application chooses to do so. But my guess is that this not worth the effort or even couterproductive on non-optimizing VMs, a method call usually beeing more expensive than a monitor enter/exit.
  • Implement the communication using NIO Channels and Buffers. This way, the above described buffer would be a direct ByteBuffer and this could be sent directly to the X server, without the hidden copy that is usually made by the SocketOutputStream. This would require to extend the LocalSocket implementation in classpath to implement SocketChannel. I think this shouldn’t be too hard. Communicating over a local socket channel to the X server is almost or exactly like using shared memory area. (My guess is that the buffer is passed through the kernel directly to the receiving process to read from, but I might be wrong here). Either way, this should certainly maximize rendering throughput for OpenGL.

Escher ideas

Wednesday, April 11th, 2007

During the last couple of weeks I worked on Escher only every now and then. I re-architectured the protocol implementation so that the object creation is minimized. Before this change Escher created one small array per request, which can have a pretty noticable impact when rendering. Now there is only one buffer into which all requests are written. This brings a pretty nice performance gain. I’m finishing this step off now (this will take another couple of days or weeks even). However, I’m happy that I have a little more spare time now (finished the study, yay!) and can continue on Escher with more drive now. Here’s a bunch of ideas what I could do in the future:

  • Avoid even more potential copies by using NIO channels. A socket implementation might require copying the passed-in byte[] on the native side. NIO channels allow to pass in a ByteBuffer which can be backed by the correct native data structure already, thus avoiding the need to copy the buffer from a Java array to a native array. Going the NIO route seems to have a couple of other advantages too (e.g. avoid polling for inbound traffic).
  • Right now, Escher is made to be completely thread safe. This is basically the same mistake as was designed into AWT. The problem is that many applications don’t need this thread safety at all and could rather implement single threaded rendering and avoid acquiring locks altogether. Another point here is that applications that do render using multiple threads usually can do the locking much more efficient, e.g. acquire a lock once, render a complete frame (or some other logical unit) and release the lock once, rather than acquiring and releasing the lock on each little graphics primitive. The plan here is to drop thread safety on outbound X traffic (e.g. rendering) and only protect inbound traffic (because this is most often handled by a separate (event) thread anyway AND the main thread (for X replies).
  • Make Escher implement the JOGL interfaces (javax.media.opengl.GL). This way, Escher could be used as a direct replacement for the standard JOGL implementation on X based systems. It is my hope that this implementation is at least as efficient as the original JOGL one, or even more efficient because native calls are avoided altogether (or at least, performed more seldomly, that is when the buffer runs full and is flushed to the network).
  • Add tesselation (and other utilities). Without tesselation, OpenGL isn’t able to render complex shapes at all. This feature can most likely be provided by the plain JOGL classes and don’t need to be rewritten. This would go hand in hand with the previous point.

Furthermore, I hope I can put more work into the Escher based AWT peers and polish them up for real use. Most importantly, this requires better TrueType font support (by the Java font impl in GNU Classpath), better AWT widget support (by the Swing backed AWT widget set) and a shitload of bug fixes in the peers and Escher itself.