Cube DSL

From OS.bee documentation
Revision as of 13:36, 8 August 2016 by Mollik (Talk | contribs) (Time dimensions)

Jump to: navigation, search

Copyright Notice

All rights are reserved by Compex Systemhaus GmbH. In particular, duplications, translations, microfilming, saving and processing in electronic systems are protected by copyright. Use of this manual is only authorized with the permission of Compex Systemhaus GmbH. Infringements of the law shall be punished in accordance with civil and penal laws. We have taken utmost care in putting together texts and images. Nevertheless, the possibility of errors cannot be completely ruled out. The Figures and information in this manual are only given as approximations unless expressly indicated as binding. Amendments to the manual due to amendments to the standard software remain reserved. Please note that the latest amendments to the manual can be accessed through our helpdesk at any time. The contractually agreed regulations of the licensing and maintenance of the standard software shall apply with regard to liability for any errors in the documentation. Guarantees, particularly guarantees of quality or durability can only be assumed for the manual insofar as its quality or durability are expressly stipulated as guaranteed. If you would like to make a suggestion, the Compex Team would be very pleased to hear from you.

(c) 2016-2024 Compex Systemhaus GmbH

Introduction

MDX and OLAP Cube

MDX(MultiDimensional eXpressions) is a query language for OLAP (OnLine Analytical Processing) databases, much like SQL is a query language for relational databases. It is also a calculation language, with syntax similar to spreadsheet formulas.

SQL was designed to query dimensional data structures, called tables, where data are organized in rows and columns. In OLAP, data are organized around multiple measures, dimensions, hierarchies, and levels.

Similar to the way tables and columns are central to SQL, dimensions, hierarchies, and levels, are the centerpieces of MDX. They are mapping business models into language-specific concepts (e.g. a list of countries will be mapped as an MDX dimension). Measures in MDX are a special case of dimensions and they hold numeric values. The concrete value of a dimension (e.g. Paris) is a member of the dimension and is similar to the value of an SQL table.

An OLAP Cube is a collection of dimensions indexing a list of measures. E.g. geometrical cube can have three dimensions, whereas an logistical cube can have many more dimensions.


More information:

http://en.wikipedia.org/wiki/MultiDimensional_eXpressions

http://en.wikipedia.org/wiki/OLAP_cube

Mondrian

Mondrian is an open source OLAP server, written in Java. It supports the MDX query language and the XML for analysis and olap4j interface specifications. It reads from SQL and other data sources and aggregates data in a memory cache.

Mondrian is used for:

  • High performance, interactive analysis of large or small volumes of information
  • Dimensional exploration of data, for example analyzing sales by product line, by region, by time period
  • Parsing the MDX language into SQL to retrieve answers to dimensional queries
  • High-speed queries through the use of aggregate tables in the RDBMS(Relational DataBase Management System)
  • Advanced calculations using the calculation expressions of the MDX language

Schema

A schema defines a multi-dimensional database. It contains a logical model, consisting of cubes, hierarchies, and members, and a mapping of this model onto a physical model.

The logical model consists of the constructs used to write queries in MDX language: cubes, dimensions, hierarchies, levels, and members.

The physical model is the source of the data which is presented through the logical model. It is typically a star schema, which is a set of tables in a relational database.

Schema file

The schema file defines the dimensions and measures for the cube and maps them to the DB tables. The product and location dimensions contain several levels each. Each level defines its corresponding field in the DB table. The different values of the field become members of the level.

Note: The levels must be specified in the intended order of breakdown.

  • The structure of the XML document is as follows:
    <Schema>
       <Cube>
          <Table>
             <AggName>
                aggElements
             <AggPattern>
                aggElements
             <Dimension>
                <Hierarchy>
                   relation
                <Closure/>
                <Level>
                <KeyExpression>
                   <SQL/>
                <NameExpression>
                   <SQL/>
                <CaptionExpression>
                   <SQL/>
                <OrdinalExpression>
                   <SQL/>
                <ParentExpression>
                   <SQL/>
                <Property>
                   <PropertyExpression>
                      <SQL/>
             <DimensionUsage>
             <Measure>
                <MeasureExpression>
                   <SQL/>
                <CalculatedMemberProperty/>
             <CalculatedMember>
                <Formula/>
                <CalculatedMemberProperty/>
             <NamedSet>
                <Formula/>
             <VirtualCube>
                <CubeUsages>
                   <CubeUsage>
                <VirtualCubeDimension>
                <VirtualCubeMeasure>
             <Role>
                <SchemaGrant>
                   <CubeGrant>
                      <DimensionGrant>
                      <HierarchyGrant>
                         <MemberGrant/>
          <Union>
             <RoleUsage/>
       <UserDefinedFunction/>
       <Parameter/>
    

  • relation ::=
       <Table>
          <SQL/>
       <View>
          <SQL/>
       <InlineTable>
          <ColumnDefs>
             <ColumnDef>
          <Rows>
             <Row>
                <Value>
       <Join>
          relation
    

  • aggElement ::=
       <AggExclude>
       <AggFactCount>
       <AggIgnoreColumn>
       <AggForeignKey>
       <AggMeasure>
       <AggLevel>
    

Logical elements
Element Description
<Cube> A collection of dimensions and measures, all centered on a fact table.
<VirtualCube> A cube defined by combining the dimensions and measures of one or more cubes.
A measure originating from another cube can be a <CalculatedMember>.
<CubeUsages> Base cubes that are imported into a virtual cube
<CubeUsage> Usage of a base cube by a virtual cube.
<VirtualCubeDimension> Usage of a dimension by a virtual cube.
<VirtualCubeMeasure> Usage of a measure by a virtual cube.
<Dimension> Dimension
<DimensionUsage> Usage of a shared dimension by a cube.
<Hierarchy> Hierarchy
<Level> Level of a hierarchy
<KeyExpression> SQL expression used as key of the level, in lieu of a column.
<NameExpression> SQL expression used to compute the name of a member, in lieu of Level.nameColumn.
<CaptionExpression> SQL expression used to compute the caption of a member, in lieu ofLevel.captionColumn.
<OrdinalExpression> SQL expression used to sort members of a level, in lieu ofLevel.ordinalColumn.
<ParentExpression> SQL expression used to compute a measure, in lieu ofLevel.parentColumn.
<Property> Member property. The definition is against a hierarchy or level, but the property will be available to all members.
<Measure> Measure
<CalculatedMember> A member whose value is derived using a formula, defined as part of a cube.
<NamedSet> A set whose value is derived using a formula, defined as part of a cube.
Physical elements
Element Description
<Table> Fact or dimension table.
<View> Defines a 'table' using a SQL query, which can have different variants for different underlying databases.
<Join> Defines a 'table' by joining a set of queries.
<InlineTable> Defines a table using an inline dataset.
<Closure> Maps a parent-child hierarchy onto a closure table.
Aggregate tables
Element Description
<AggExclude> Exclude a candidate aggregate table by name or pattern matching.
<AggName> Declares an aggregate table to be matched by name.
<AggPattern> Declares a set of aggregate tables by regular expression pattern.
<AggFactCount> Specifies name of the column in the candidate aggregate table which contains the number of fact table rows.
<AggIgnoreColumn> Tells Mondrian to ignore a column in an aggregate table.
<AggForeignKey> Maps foreign key in the fact table to a foreign key column in the candidate aggregate table.
<AggMeasure> Maps a measure to a column in the candidate aggregate table.
<AggLevel> Maps a level to a column in the candidate aggregate table.
Access control
Element Description
<Role> An access-control profile.
<SchemaGrant> A set of rights to a schema.
<CubeGrant> A set of rights to a cube.
<HierarchyGrant> A set of rights to a hierarchy and levels within that hierarchy.
<MemberGrant> A set of rights to a member and its children.
<Union> Definition of a set of rights as the union of a set of roles.
<RoleUsage> A reference to a Role.
Extensions
Element Description
<UserDefinedFunction> Declares a user-defined function.
<CellFormatter> Cell formatter.
<MemberFormatter> Member formatter.
<PropertyFormatter> Property formatter.
<Script> Script fragment to implement an SPI such as user-defined function, member formatter, or cell formatter.
Miscellaneous
Element Description
<Annotations> Holder for annotations.
<Annotation> User-defined property attached to a metadata element.
<Parameter> Part of the definition of a Hierarchy; passed to a MemberReader, if present.
<CalculatedMemberProperty> Property of a calculated member.
<Formula> Holds the formula text within a <NamedSet> or <CalculatedMember>.
<ColumnDefs> Holder for <ColumnDef> elements.
<ColumnDef> Definition of a column in an <InlineTable> dataset.
<Rows> Holder for <Row> elements.
<Row> Row in an <InlineTable> dataset.
<Value> Value of a column in an <InlineTable> dataset.
<MeasureExpression> SQL expression used to compute a measure, in lieu ofMeasure.column.
<SQL> The SQL expression for a particular database dialect.
Cubes

A cube is a named collection of measures and dimensions. The one thing the measures and dimensions have in common is the fact table. The fact table holds the columns from which measures are calculated, and contains references to the tables which hold the dimensions.

Syntax:

<Cube name="{cube name}">
    <Table name="{table name}"/>
    ...
</Cube>


The fact table is defined using the <Table> element. If the fact table is not in the default schema, you can provide an explicit schema using the "schema" attribute, for example


Syntax:

<Table schema="{schema name}" name="{table name}"/>

You can also use the <View> construct to build more complicated SQL statements. The <Join> construct is not supported for fact tables.


Measures

Each measure has a name, a column in the fact table, and an aggregator.

Syntax:

<Measure name="{measure name}" column="{column name}" 
    aggregator="{aggregator name}" datatype="{type name}" 
    formatString="{format}"/>


Example:

<Measure name="Store Sales" column="store_sales" aggregator="sum" 
    datatype="Numeric" formatString="#,###.00"/>


The aggregator are

  • sum - returns the sum value of the specified set expression.
  • count - returns the number of elements in the specified set.
  • min - returns the min. value of the specified set expression.
  • max - returns the max. value of the specified set expression.
  • avg - compute the average value over the specified set.
  • distinct-count - returns the number of unique non empty elements in the specified set. It has some limitations if your cube contains a parent-child hierarchy.

The optional datatype attribute specifies how cell values are represented in Mondrian's cache, and how they are returned via XML for Analysis.

The datatype attribute can have values

  • String
  • Integer, default value for "count" and "distinct-count" measures
  • Numeric, default value, except for "count" and "distinct-count" measures
  • Boolean
  • Date
  • Time
  • Timestamp.

An optional formatString attribute specifies how the value is to be printed. The ',' and '.' symbols are locale-sensitive.

A measure can have a caption attribute to be returned by the Member.getCaption() method instead of the name. Defining a specific caption does make sense if special letters (e.g. Σ or Π) are to be displayed:

Example:

<Measure name="Sum X" column="sum_x" aggregator="sum" caption="&#931; X"/>
Dimensions, hierarchies, levels

A member is a point within a dimension determined by a particular set of attribute values. The gender hierarchy has the two members 'M' and 'F'. 'San Francisco', 'California' and 'USA' are all members of the store hierarchy.

A hierarchy is a set of members organized into a structure for convenient analysis. For example, the store hierarchy consists of the store name, city, state, and nation. The hierarchy allows you to form intermediate sub-totals: the sub-total for a state is the sum of the sub-totals of all of the cities in that state, each of which is the sum of the sub-totals of the stores in that city.

A level is a collection of members which have the same distance from the root of the hierarchy.

A dimension is a collection of hierarchies which discriminate on the same fact table attribute (say, the day that a sale occurred).

Syntax:

<Dimension name="{dimension name}" foreignKey="{key name}">
    <Hierarchy hasAll="true" primaryKey="{key name}">
        <Table name="{table name}"/>
        <Level name="{level name}" column="{column name}" 
            uniqueMembers="true"/>
    </Hierarchy>
</Dimension>


Example:

<Dimension name="Gender" foreignKey="customer_id">
    <Hierarchy hasAll="true" primaryKey="customer_id">
         <Table name="customer"/>
         <Level name="Gender" column="gender" uniqueMembers="true"/>
    </Hierarchy>
</Dimension>

This dimension consists of a single hierarchy, which consists of a single level called Gender.

The values for the dimension come from the gender column in the customer table. The "gender" column contains two values, 'F' and 'M', so the Gender dimension contains the members [Gender].[F] and [Gender].[M].

For any given sale, the gender dimension is the gender of the customer who made that purchase. This is expressed by joining from the fact table "sales_fact_1997.customer_id" to the dimension table "customer.customer_id".


Syntax:

<Table schema="{schema name}" name="{table name}"/>

You can also use the <View> construct to build more complicated SQL statements. The <Join> construct is not supported for fact tables.


"ALL" member

By default, every hierarchy contains a top level called (All), which contains a single member called (All {hierarchyName}). This member is parent of all other members of the hierarchy, and thus represents a grand total. It is also the default member of the hierarchy; that is, the member which is used for calculating cell values when the hierarchy is not included on an axis or in the slicer. The allMemberName and allLevelName attributes override the default names of the all level and all member.

If the <Hierarchy> element has hasAll="false", the 'all' level is suppressed. The default member of that dimension will now be the first member of the first level; for example, in a Time hierarchy, it will be the first year in the hierarchy. Changing the default member can be confusing, so you should generally use hasAll="true".


Time dimensions

Time dimensions based on year/month/week/day are coded differently in the Mondrian schema due to the MDX time related functions :

  • ParallelPeriod([level[, index[, member]]]) –returns a member from a previous/next period with the same position in the level as the specified member.
  • PeriodsToDate([level[, member]]) –returns the set of members starting from the first sibling of the specified member and ending with the specified member itself.
  • WTD([member]) –returns PeriodsToDate( Week_Level, Member ).
  • MTD([member]) –returns PeriodsToDate( Month_Level, Member ).
  • QTD([member]) –returns PeriodsToDate( Quarter_Level, Member ).
  • YTD([member]) –returns PeriodsToDate( Year_Level, Member ).
  • LastPeriod(index[, member]) –returns the members that stand before the specified member including the specified member itself within the level members.

Time dimensions have type="TimeDimension". The role of a level in a time dimension is indicated by the level's levelType attribute, whose allowable values are as follows:

levelType value Meaning
TimeYears Level is a year.
TimeQuarters Level is a quater.
TimeMonths Level is a month.
TimeWeeks Level is a week.
TimeDays Level is a day.


Example:

<Dimension name="Time" type="TimeDimension">
    <Hierarchy hasAll="true" allMemberName="All Periods" primaryKey="dateid">
        <Table name="datehierarchy"/>
        <Level name="Year" column="year" uniqueMembers="true" 
            levelType="TimeYears" type="Numeric"/>
        <Level name="Quarter" column="quarter" uniqueMembers="false" 
            levelType="TimeQuarters"/>
        <Level name="Month" column="month" uniqueMembers="false" 
            ordinalColumn="month" nameColumn="month_name" levelType="TimeMonths" 
            type="Numeric"/>
        <Level name="Week" column="week_in_month" uniqueMembers="false" 
            levelType="TimeWeeks"/>
        <Level name="Day" column="day_in_month" uniqueMembers="false" 
            ordinalColumn="day_in_month" nameColumn="day_name" 
            levelType="TimeDays" type="Numeric"/>
    </Hierarchy>
</Dimension>


Multiple hierarchies

A dimension can contain more than one hierarchy. By default, a hierarchy has the same name as its dimension.

Shared dimensions

A dimension can contain more than one hierarchy. By default, a hierarchy has the same name as its dimension.

When generating the SQL for a join, Mondrian needs to know which column to join to. If you are joining to a join, then you need to tell it which of the tables in the join that column belongs to (usually it will be the first table in the join). Because shared dimensions don't belong to a cube, you have to give them an explicit table (or other data source). When you use them in a particular cube, you specify the foreign key. This example shows the Store Type dimension being joined to the Sales cube using thesales_fact_1997.store_id foreign key, and to the Warehouse cube using the warehouse.warehouse_store_id foreign key:


Example:

<Dimension name="Store Type">
    <Hierarchy hasAll="true" primaryKey="store_id">
        <Table name="store"/>
        <Level name="Store Type" column="store_type" 
        uniqueMembers="true"/>
    </Hierarchy>
</Dimension>

<Cube name="Sales">
    <Table name="sales_fact_1997"/>
    ...
    <DimensionUsage name="Store Type" source="Store Type" 
    foreignKey="store_id"/>
</Cube>

<Cube name="Warehouse">
    <Table name="warehouse"/>
    ...
    <DimensionUsage name="Store Type" source="Store Type" 
    foreignKey="warehouse_store_id"/>
</Cube>