We live in a world of false superlatives. Every damn new thing is the latest, greatest, fastest, etc. Then you bring your bright shiny new widget home from the store, and maybe it works great for about 5 minutes before falling apart. Nowhere has this disappoint been greater than in the world of computers. So we have become jaded.
Time to lose your funk. Web 2.0 is on the way, and the Open Document format is the horse that it will be riding in on, and the name of that horse is OpenOffice.org 2.0 stable version. (Plus the LAMPPP stack. Okay, so Web 2.0 will have more than one horse). Very soon, Joe Sixpack will have a more viable alternative to the Microsoft Office for performing one of the most important, if mundane, functions of desktop computing anywhere, anytime, on any desktop. Word processing. Spreadsheets. Presentations. All free as in beer, and free as in freedom. And while Google it is not actually clear to what extent Google will be providing access to OpenOffice.org over the web, it is very clear that the clean XML upon which OpenOffice.org 2.0 is based will allow interoperability among modular, loosely coupled disparate computer systems through of service-oriented architectures (SOAs) in ways that we can now only begin to imagine.
Think SOA is just a boring acronym? Think again. SOA is what the collaboration of Web 2.0 is all about. As Kevin Kelley said in a recent Wired article, the Internet is probably going to become the first form of artificial intelligence. You will soon be able to share data generated 50 years ago across computing applications that are being written today. The possibilities are staggering.
Uncle Vic, the scowling Mad Penguin™ mascot, is so pumped about this new development that Mad Penguin™ will be running a series of three interviews with people who are in the trenches in the work to bring out OOo 2.0. The first of these interviews, with Florian Reuter, covers some of the differences between the truly open XML found in OOo 2.0, and the closed MS Word ML found in the upcoming Microsoft Office 12, as well as the importance of simple end users in the process of improving the code with bug reports.
OOo 2.0 is really different from Microsoft Office in a way that makes a difference. Be sure to check back here with Mad Penguin™ as we approach the formal release of the stable version of OOo 2.0 to find out more about where this powerful new office suite is going, and how it will change the way that humanity communicates.
Mad Penguin: Thanks for talking with me this evening, Florian. Could you please tell us your name and your role with Sun?
Florian Reuter: My name is Florian Reuter. I'm from Hamburg, Germany, and I'm a software developer. I work with the StarOffice team, which is a major contributor to the OpenOffice.org source base. I'm responsible for the import and export of the .doc format, the .rtf format, and the Wordprocessing ML format from Microsoft to OpenOffice.org. I also maintain the Open Document filters of the OpenOffice.org Writer..
MP: How did you get into this field? Tell us about your background.
FR: Yeah, I was studying at Kiel. I worked on XML validation, XML language binding, and so forth. I'm from the web service architecture side. After graduating, I applied for several jobs, and the offer from Sun was very good, because it lets me work with the OpenOffice.org community. And so now I've been with Sun for seven months.
MP: We've all heard so much about OOo 2.0. What are you personally looking forward to most about the stable release of 2.0?
FR: Well, before we get into that, I only joined Sun recently, as I mentioned earlier, and all the work which is in the current version of OpenOffice.org has been contributed my colleagues and by the community. They did a lot of the work on the current .doc and .rtf formats, and they really did a great job. And I'd also like to mention a really cool feature called “X-forms” support in OpenOffice.org 2.0.
MP: Who is working on X-forms?
FR: Lars Oppermann is now focused on X-forms. He's also a member of the Open Document TC. So, Lars is responsible for x-forms and I'm responsible for the import and export of Microsoft formats. By the way, please also let me mention that our OpenOffice.org names are “lo” resp. “flr” in case you want to assign bugs ;-)
It was very visionary to include X-forms in OOo / SO Writer. Our web services / service-oriented architectures will be much better as a result of that work. It will continue evolving, though, and we will see a huge impact as a result of being able to have a formula engine in OpenOffice.org Writer.
MP: There are a number of people in the OpenOffice.org community, including Gary Edwards of OpenStack.us, who believe that X-forms is where OOo 2.0 not only achieves parity with Microsoft Word, but in fact exceeds Word's performance. What do you think?
FR: The first precondition is, I believe that web services and service-oriented architectures will be a great deal. Many of the bug docs that I got I received from the community are tables which encode formulas, and I'd like to give an example. Everybody has filled out a form with a request for vacation holiday to submit to your boss. You have to fill out the starting date, the end date, how many days you want to be gone, and who is responsible when you are gone. You first have to get the form, you download it, you print it, you fill it out. Then you mail it or you get it to your boss in some other way. Then the request gets granted, then somebody has to maintain the data base of how many holidays you have left, and so forth. It is slow and inefficient.
With web services and service-oriented architectures and X-forms, this process will be entirely different. You'll download the forms from your company's website, fill out the form, press submit button, the data will be sent to a web server which maintains the holidays left, and everything will get done automatically. It will tell you if you have enough days left, a notification will be sent to the person who has to approve the holiday application, and the whole process will be much smoother. This is how web flow will be done more and more over the next year or two. Having support for the end user this way will be a big deal, and will change how we think of collaboration with forms.
MP: Sounds like smart forms on steroids.
FR: The special thing about X-forms in OpenOffice.org is that it unites two worlds. You have the world of automatic processing chains of web services of J2EE servers, and they process pure, clean XML documents. The user wants to see the formula. He does not want to see XML language, like or . It's confusing and distracting. The question is, how can an end user more elegantly and more simply produce a document like this. With X-forms in OpenOffice.org and StarOffice, a simple end user is offered a bridge. A normal end user, who is accustomed to filling out forms is now capable of producing clean XML documents that is needed for automatic processing chains. That's big deal. You have a bridge.
Everybody who wants to participate in this industry needs to answer the question, “How are you going to bridge the end user into the process.” We have a powerful answer to that question. X-forms is a W3C standard, which basically describes the mapping between the raw XML documents and your formulas. This can be pretty sophisticated. For example, if your holiday is longer than two weeks, you need to make a case to someone up the chain from you, you don't have to make that case. You can model some pretty complex formulas into X-forms.
MP: So basically this means that OpenOffice.org and StarOffice are making it easier for the simple end user to create smart forms?
FR: Yes, but it's even more than that. X-forms makes it possible for the end user to participate in service-oriented architectures. You can have a cool web service which handles all of your vacation needs of the company, but if your end users aren't able to easily access the service, it's useless. These X-forms, which are embedded into OpenOffice.org, is the answer to this problem.
MP: Microsoft obviously has its own XML, but it's not a genuine open XML as we know it from OpenOffice.org. Can you tell us in practical terms what the difference here is?
FR: So on the one hand you have the Open Document format that OpenOffice.org uses, and on the other hand, you have Word ML-based file format which will be embodied in Microsoft Office 12. The Open Document file format, in my opinion, is clearly designed to fit the user's needs. Many people were involved in trying to figure out what users need in OpenOffice.org so that their data could be retrieved in 5, 10, 20 or 100 years or more. Data is their value, and they want their value encoded so that they have access to their value at any time. It's their data. If you talk to the people who were involved with the specifications for OpenDocument, you'll find that it was pretty hard for the Open Document to achieve that kind of openness. There were many independent people involved in the process. Gary Edwards can talk more about the process. What came out was a file format that really cares about your data. You can be pretty sure that your ideas are stored in a way that you can access them in the distant future.
From my point of view, there are two different aspects of designing XML. On the one hand, you ask “what is it that the user needs for their data storage.” Then you try to support that goal. The other aspect, Microsoft's aspect, is to look at the needs of the application, determine what it needs to do, then store it in Word ML. Open Document concerns itself with the needs of the user first, and the application second.
MP: You said earlier that you were helped out by end user bug reports. Many end users don't really understand how they fit into your world as a developer. They don't understand that there are developers out there who really do read their bug reports. What message do you have for simple end users about bug reports?
FR: If you have a Word document in .doc or .rtf or Word ML, and you use the current filter, and something goes wrong, even something not very noticeable, please submit the document as a bug document to OpenOffice.org, so that we can get a critical mass of documents that we can look at. When we do the investigation of the file formats, we do it in the following way. We look at the basic engineering approach. We look at what happens, and we make an assumption. Then you have to see if your assumption is correct or not. The more documents we have, the more we can test whether our assumption is correct. The real value to me as a developer is having a huge amount of documents available so that I can check my hypothesis to see if it's correct.
MP: Let's break that down a bit. How does an end user help you out. What are the mechanical steps for submitting a bug document?
FR: Go to the OpenOffice.org website. Register as a user. You just have to pick up a user name. Enter your email address. Click on issue. Click on submit a new issue. Select the WW8 filter. Attach the file, and submit the bug. If possible, please also provide a description with what is wrong with the document.
MP: How is an end user going to know that there is something wrong with the document so that they need to do a bug report?
FR: That's easy. If you document doesn't look right, just go ahead and submit it as a bug. Let us worry if it is a bug or not. Don't assume that you have made a mistake. It could be a bug.
If you use Windows, you can download a free Word viewer which will let you see what the document looks like. So you can download the viewer, view the document in Word, compare it to what OpenOffice.org looks like, and if it doesn't look right, then submit it as a bug.
MP: Give us a little more detail about the value of bug reports. You mentioned something earlier about having lots of tables, and how that was helpful. Could you elaborate on that a little more?
FR: You need to have a critical mass of documents to either prove or disprove your current hypothesis as to how the Microsoft layout formatting works. As a developer, you see the differences at the edges of performance. OpenOffice.org can import >90% of the documents correctly in 2.0. It is in the remaining 10% or 1% that some kind of magic happens, and you're not sure what it is. You can only figure out what magic is making that work by looking at lots of samples. It's only by seeing patterns in that small margin that you can get an idea of what makes the magic happen. You need to have users making unexpected use of the application to figure out what is going on. As a developer, it's hard to imagine all the different ways that something might be used. In computer science, having lots of samples helps us refine our hypotheses as to how to best import files from Microsoft Word. Real world samples make a big difference for us. It's that way in most fields of science. You need to have a critical mass of data to test your theory. So simple end users are really critical in the process of making the code better. It's important for end users to understand how important they are in the process. That's one reason why I enjoy working with the OpenOffice.org community. They help me do my job better. So thanks!
MP: Is there anything else that you would like to add?
FR: Yeah, I hope that OpenOffice.org 2.0 will be a great success. I love working with OpenOffice.org. It's great working with the community. I want to thank all the people who do the quality assurance of my work. It's really great to see how many people look at it, actually, and verify the work.