Friday, June 13, 2008

Introducing Grml

Writen by Toby J. Rhodes

Creating a new markup language.

Introduction.

General Reuse Markup Langauge, or GRML, is a markup language for web browsers. It has the data definition features of character-delimited files and XML, with the hyperlinking and form support of HTML.

The purpose of this article is to show why GRML exists and how it complements HTML, XML, RSS, and character-delimited formats.

Background.

GRML is not the result of a specific plan. It was developed as a solution to another problem, namely reusing data from a web service. It began with the development of a web front-end to request content from a few web services. A data format was needed to handle responses. Having data in some arbitrary format was too limiting. Something formal was needed.

HTML and XML were considered, but they did not quite fit the front-end being developed. There needed to be another choice, one with...

support for multiple views (the front-end used a List control that has 4);


a way to define multiple sets of data for multidimensional views;


content that translates to/from other formats; and


a distinction between the display of the form and view.

Since there was no format that met all the requirements, the front-end was going to need something new. Using the front-end, it was possible to develop a format and test it for these requirements. In other words, the front-end existed before the markup language!

The format that resulted was GRML. It was designed to use forms and views, supports multiple and multidimensional views, works with existing web servers, and adapts to other formats. Once the markup language was finished, the web front-end became a web browser.

Now that the objective for GRML has been explained, the next step is to understand, in detail, why existing formats were not chosen.

Understanding Markup Languages.

Before going through the process of understanding why GRML is necessary, the existing formats need to be introduced and their design goals identified. The formats are considered from a data handling perspective, so no discussion of games, movies, music, advertising, and entertainment are mentioned.

For the purposes of creating a markup language, the two major features for browsing web pages are the form and view. A form contains any input control for user requests. A view displays content, or data from the web page without the markup tags or formatting elements.

Given the requirement of the form and view, it is possible to compare each format.

HTML is the most prevalent format on the web. It is designed for data display. There is form and view support.

XML is a minor format on the web. It is designed for data definition. It lacks form and view support.

RSS is a minor format on the web. It is designed for data definition. It lacks form support but has a view.

CSV or character-delimited formats are rarely used on the web. It is designed for data definition. It lacks form support but has a view.

Now that each format has been introduced, it is possible to understand the place for GRML on the web.

Let's begin with...

HTML.

There is really only one markup language in widespread use on the web (in other words, 99% of all web pages use this language), and that is HyperText Markup Language, or HTML. HTML describes how data is displayed. It tells the web browser how the web page looks in the web browser view. With HTML, all content is displayed in the view, including forms, text, and images. HTML decides how to display the web page.

Web page content, using HTML, is defined only for images and hyperlinks. Text content is not defined, making it incompatible to use in other formats. Therefore, adapting HTML content to other formats is the most limited of all formats considered.

The single view approach of HTML prevents dynamically switching the content in the view. There is no way to present related sets of HTML content (e.g. 2 different pages from a message board, or 4 different pages of news headlines, or 8 different pages of auction results, etc.) in the view without loading different pages and navigating between them. Hence, HTML does not support multidimensional views.

Because HTML decides the web page display, it prevents multiple views of content. HTML does not support multidimensional views and is not easy to adapt to other formats. Also, it combines the form and view in one display. For these reasons, it proved to be an inadequate choice.

Next is...

XML.

XML, or eXtensible Markup Language, is designed for adaptability. Databases, spreadsheets, CSV, or character-delimited files are all potentially able to format their data using XML. It defines what data is, rather than how it is displayed. This makes XML adaptable to other file formats.

There is no one XML document format. It is a standard for defining how to structure data. This lack of a specific data format prevents XML from defining any view of its content. It also does not define input controls for use in a form.

A lack of view support in XML prevents multiple AND multidimensional views. Without form support, a user is not able to send requests. While XML is adaptable to other formats, it is not an adequate choice.

So far, HTML and XML have proven insufficient. The next to consider is...

RSS.

RSS, or Really Simple Syndication, is a specific data format of a XML data structure. Therefore, RSS is able to support a view of its data. Also, since it is based on XML, it defines its data rather than how it is displayed. View support with data definition means that RSS supports multiple views of its content.

As an XML format, RSS lacks any form support. Input controls do not exist using XML, hence are missing from RSS. For this reason, it is not sufficient.

Only one format remains, and it is...

CSV or character-delimited.

CSV (comma separated values) or character-delimited formats are used by databases, spreadsheets, and many other data-oriented applications to store information to file. It is a format that is adaptable to other formats because it does not use any display tags. The format consists almost entirely of content, except for the character used for the delimiter.

This format has a view because it is almost entirely content and lacks markup tags. Its focus on content means that it is the most reusable of any format considered. No display tags are used, so it supports multiple views.

The lack of data definition tags means there is no way to distinguish between sets of data. Hence, CSV or character-delimited files do not support multidimensional views. In addition, it is not possible to define input controls for a form. This means no form support.

Therefore, this format is an insufficient choice. This is why it was necessary to create...

GRML.

GRML defines the form and view separately. Input controls for a form are defined separately from content used in the view. Also, content is defined explicitly in GRML, with text defined separately from hyperlinks and images. Display tags do not exist in GRML. The web browser decides how to display the web page. This means support for multiple views.

Using data definition tags allows GRML to be adaptable to other formats (HTML, XML, RSS, CSV or character-delimited). It also enables different sets of content to be named, which means support for multidimensional views.

Conclusion.

After considering all the available formats for a markup language, each lacked at least one of the listed requirements. None met the design goals of the web front-end. Therefore, it was necessary to create a new format, GRML.

Quick Reference.

HTML is used with multi-form, single-view, one dimensional, display-oriented web browsers.

GRML is used with single-form, multi-view, multidimensional, data-oriented web browsers.

RSS is used with no form, single-view, one dimensional, data-oriented web browsers.

About The Author

Developing with MFC for a couple of years now. Working at getting my new web browsers just right. Take a look at GRMLBrowser.com.

Living in Memphis, TN and it is great coz there are absolutely no major sports teams (well, except for the Grizzlies).

No comments: