|
|
|
OverviewThis document provides detailed documentation of the MNV data format used to provide data to the EZChooser client. For a given set of items, the MNV data file defines attributes and values, much as in a relational database or table. It goes beyond basic data definitions by also providing a means of defining presentation and typing information as well as some initialization parameters. Presentation information might include, for example, the image that would be associated with a given item as well as an associated URL link. It is also possible to include typing and presentation information for attributes and values. Parameter initialization in the data file is essentially an alternative to providing these parameter specifications in the applet HTML file, which may be more convenient in some cases. This data format is an augmentation of the standard CSV (comma-separated values) format, used, for example, in spreadsheet programs such as Microsoft Excel. In fact, EZChooser 1.1 parsers will accept an ordinary CSV file. Such a file will specify the basic data values, and the MNV Compiler program will supply default presentation methods and default initialization parameters. The enhanced CSV format is called MNV (MultiNaV) format. It encapsulates the ordinary CSV data specification inside a set of <DATA> tags. It then allows for two additional sections for rules and initialization. Thus there are up to three optional sections, appearing in any order, within the file:
Data SectionThe data section of an MNV file must be enclosed with the <DATA> </DATA> tags. The specification within the tags is identical to a standalone CSV file with no tags. As in standard CSV, the fields are separated by commas (',') and new lines indicate a new table row. Those fields that have commas within their value strings should be enclosed within double quotes (""). EZChooser will interpret the first row as a header row, each field being the column name. Subsequent rows are data rows. Thus each column defines a distinct attribute, a.k.a. feature or dimension, while each row defines an item. Each row in the data section should have the same number of fields. If there are rows that have different number of fields, the parser will make no effort (such as appending blank fields) to make up the missing fields, and an error message will be issued. Data typesThe data type of each dimension may be inferred from the values of that column. Here are the data types supported in EZChooser: string, boolean, integer, float, currency, date. The rules for inferring dimension data types are as follows. They are fired in the specified sequence:
Data typing specified for the dimensions in the rules section override any type inferences described above. Example data sectionHere is a simple example: <DATA> Make,Model,Consumer Guide Recommendation,Class, Price Chevrolet,C/K 2500/3500,No recommendation,full-size pickup, $15000 Chevrolet,Silverado 1500,Best Buy,full-size pickup, $17000 Ford,F-250/350 Super Duty,Best Buy,full-size pickup, $20000 Toyota,Tundra,Recommended,full-size pickup, $19000 </DATA> This example is equally valid if no Rules SectionThe rules section must be enclosed in <RULES> </RULES> tags. The rules section is designed to provide metadata and presentation information for the tabular data in the Data Section. Functions include:
Basic rule syntaxThe basic syntax for rules associates a key with a value. For example, to associate a dimension's text label to a particular text string, we would write: dimension.4.label.text=Vehicle Class The part to the left of the "=" sign is the key, and the part to the right is the value. A dot notation on the key signifies a hierarchical property decomposition in the usual sense. String values to the right of the "=" do not have to be quoted in general. Note that a rule has to be specified on a single line of the file. We have broken the lines in this document at times for readability only. Reference to columnsMany rules allow references to a column in the data table. For instance, in the example rule above, the "4" makes reference to column 4 in order to identify a dimension. All column references are "1-based," (the count starts from 1 rather than 0). Column numbers may also be used in right-hand-sides of rules, and this is typically used for specifying item presentation values. For instance, if you want to specify that the text label for items is to be found in column 2, then you would write a rule like this: item.label.text=2 If a column reference appears in a rule right-hand-side, then the parser will assume that the column referenced is not normal dimension data, and it will not be shown as such. That is, it will not be presented as feature data in EZChooser. However, if the user explicitly specifies a dimension in the left-hand-side of a rule as, for example, below, then it will be treated as such. That is, because it appears in the left-hand-side of the rule, dimension 2 will be shown as a EZChooser row despite the fact that it also appears as a column reference on the right-hand-side of a rule. item.label.text=2 String variablesFor right-hand-sides of rules, there is a convention offered for substitution of variables within strings. If a value specification surrounds an integer with up carets ("^"), then this will be interpreted as a reference to a column. This value will be substituted into the string. An example is shown here:item.label.image=carimages/^17^.jpg The above rule states that for an item label's image (it takes a URL string as value), substitute the value in column 17 for that item in place of "^17^". TypingAll dimensions are assigned a type. If a type is not explicitly assigned through a rule, then the type will be inferred as explained in the data section. Rules can assign any of the types string, boolean, integer, float, currency, date as in the following example. dimension.7.datatype=integer Presentation informationThe overall space (data set) in EZChooser can have presentation information specified, as can items and dimensions. For items, presentation information is used in the lower half of the screen area where items are listed that match the value restrictions in the dimension screen area (upper half). For dimensions, presentation information is used in the rendering of dimension (feature) rows. Dimension presentation includes the dimension labels, units of measurement, and cell values (text on buttons). In general, "text", "image", and "url" presentations can be specified in both label and detail categories. There are also some other special cases such as icon (glyph) drawings and nouns to use in descriptions of the item sets. Note that not all fields in the spec are currently being utilized by the EZChooser applet (Version 1.1), although we anticipate changes in the future. In particular, EZChooser Version 1.1 is not rendering information related to spaces or detailed presentations of any kind. Item presentationHere is an example of specifying item presentation:
In each of the rules above, a column number is referenced in the right-hand-side, which is typical for specifying item presentation.
Item descriptionsThe keyword 'itemdesc' allows additional specification of how to describe an item. at given points in the application. Are they vehicles, digital cameras, or other things? The glyph (icon) definition is to give a graphical representation of the objects, so that different icons for vehicles, cameras, etc. can be used. For example, here is how we have specified this extra information for a dataset of cars:
Dimension presentationThe 'dimension' keyword is to specify what columns to include as a data feature in Multinav. The column numbers must be existing columns in the data section. Other attributes of a dimension, such as how its values are presented, its units, etc., can also be specified. For example, the next set of rules specifies presentation for a miles-per-gallon dimension: dimension.7.label.text=City Fuel Efficiency The first two lines above should be straightforward; they specify dimension and unit presentations that appear at the left edge of a dimension row. The third line specifies the column in which to find the URL link for that dimension, which will be presented as an underlined link on the dimension text label. This feature is intended to provide a hook for explanatory information about dimensions. At times it may be important to explicitly specify the presentation of values. Note that values provide the basis of sorting items in each dimension row in EZChooser. Thus it may be important to preserve this underlying value for sorting purposes but still allow a presentation string that is different. An example might be a dimension such as screen resolution. Here the application designer may want the underlying value to be the total number of pixels (an integer) but the presentation to be the string "width X height." If so, you would specify it as in the following example: dimension.8.unit.label.text=Screen Resolution
dimension.8.value.text=9
where column 9 contains entries like "1024X768". Note that if you do make use of this convention for specifying labeling on dimension values, it is only possible for a single value to have a single text presentation. In other words, you cannot map the same value to different presentation strings even though different items may be involved. Special formattingText formats can be specified for both parsing the input file and presenting within EZChooser. The keyword for parsing the input file is "format" and the keyword for presentation within EZChooser is "presentationFormat." An example of the use of a formatting instruction in rules follows. This rule says that float values in dimension 10 should be presented with one decimal place and a comma every third digit to the left.. dimension.10.datatype=float dimension.10.presentationFormat=#,##0.0 Here is another example for dates. This combination of rules says that the format for dates in the mnv file is the numerical slash format pattern month/date/year, e.g., 05/02/99. However, the presentation of the value in EZChooser dimension rows should be year only with four digits, e.g., 1999. dimension.9.datatype=date dimension.9.format=MM/dd/yy dimension.9.presentationFormat=yyyy
IDsItems and dimensions can have IDs specified. IDs are used to help initialization of the Multinav Navigator, so that in either the data file or in applet parameters, the user can easily specify which dimensions are to be displayed and in what order and which items are to be marked initially. For example, you may want to assign an ID to a dimension as follows:dimension.6.ID=high price attribute Now you may (in fact you have to) refer to this attribute via the string "high price attribute". One place this is commonly used is in the initialization section, where one specifies the order in which dimensions are presented. Dimensions may be referred to by position if no ID is specified. Items have their IDs specified via a column reference. A typical example would be item.ID=7 which would indicate that each item's ID is in the cells of column 7. Clustering of valuesIt is often advantageous to have the compiler aggregate or cluster values within dimensions. As data sets get larger, this gets more and more critical--a user could, for instance, be presented with just 7 buttons for value ranges that could stem from 100 different values in the original data. The algorithms for aggregation are type specific.
Of the three types of clustering supported, we imagine that string-based clustering may be less useful than numeric and date types. Here is an example of how to invoke clustering. You specify the number of clusters you would like for individual dimensions. (7, plus or minus 2, seems to be a good target for number of clusters.) dimension.9.NumberOfClusters=7 A good tip is to make use of the special formatting in combination with clustering. The compiler will respect the formatting instructions when it creates the strings representing value ranges. For instance, if you want a date dimension to just include a two-digit year with apostrophe when it prints out values, you may use an instruction such as this: dimension.9.datatype=date dimension.9.presentationFormat=''yy dimension.9.NumberOfClusters=7 The resulting presentation on a value button would be something like this: '77-'89 Complete BNF specThe complete BNF specification for the rules section follows:
The matching between keys and values should be obvious, such as datatype key should only have dataTypeKeys as its value. Here are some points that can not be seen from above.
Example rule sectionHere are some examples of rules with accompanying comments:<RULES> Error checkingErrors being checked
Errors not being checked, but should be
Initialization SectionThese are the initialization parameters that can optionally be specified in the data file and/or as applet parameters. What is specified through applet parameters will override that in the data file:
Example initialization sectionThe following example mixes the conventions for dimension reference. Those dimensions that have been named, such as "low price attribute," are referred to as such. Where dimensions have not been named, they are referred to by position. There should be no line-feeds in such a spec. <INIT> |