Windows Azure Diagnostics enables you to collect diagnostic data from an application running on Windows Azure. You can use diagnostic data to perform the following tasks for an application:

  • Debugging
  • Troubleshooting
  • Performance measurement
  • Resource usage monitoring
  • Traffic analysis
  • Capacity planning
  • Auditing

After the diagnostic data is collected, it must be transferred to Windows Azure Storage for persistence and access. Transfers can either be scheduled or on-demand.

The DiagnosticsMonitor is responsible for transferring file-based logs, performance counters, and event logs to Windows Azure Storage. You can configure DiagnosticsMonitor to also monitor a file directory in a Windows Azure instance. It can then push the changes in directory files to blobs in Windows Azure Storage, which are then accessible from any application having access to the storage account (even applications running locally, outside of Windows Azure). By making use of this feature, it’s easy to monitor files in Windows Azure instances without having to connect via Remote Desktop. For example, you can configure Diagnostics to monitor the log folder of Tomcat and access the log files that are pushed as blobs for debugging and data analysis.

The instances running diagnostics persist the diagnostics data to a centralized location in Windows Azure Table and Blob storage. This data can be retrieved later for analysis.

The February 2012 CTP release of Windows Azure SDK for Java does not offer an API to configure and host DiagnosticsMonitor, and there is no REST API as well. Diagnostics must be enabled with .NET code. We authored an executable application in .NET and named it ConfigureAzureDiagnostics. This application configures and hosts DiagnosticsMonitor. The ConfigureAzureDiagnostics application runs as a background start-up task in the worker role that hosts the Java application.

Note: The enabling of diagnostics is independent of any Java application you write.

We recently published CloudNinja for Java to github, a reference application illustrating how to build multi-tenant Java based applications for Windows Azure. CloudNinja for Java uses Windows Azure Diagnostics to capture performance counters and monitor Tomcat access logs.

Specifying Input for ConfigureAzureDiagnostics

To specify the configuration for DiagnosticsMonitor, we provide an input configuration XML file DiagnosticsConfiguration.xml. The schema of the XML file is:

?xmlversion=“1.0”encoding=”utf-8″?>

<xs:schemaattributeFormDefault=“unqualified”elementFormDefault=”qualified”xmlns:xs=”https://www.w3.org/2001/XMLSchema”>

<xs:elementname=“DiagnosticsConfig”>

<xs:complexType>

<xs:sequence>

<xs:elementname=“Directories”>

<xs:complexType>

<xs:sequencemaxOccurs=“unbounded”>

<xs:elementname=“Directory”>

<xs:complexType>

<xs:attributename=“ContainerName”type=”xs:string”use=”required” />

<xs:attributename=“LocalPath”type=”xs:string”use=”required” />

<xs:attributename=”IsLocalPathRelative”type=”xs:boolean”use=”required” />

<xs:attributename=“DirectoryQuotaInMB”type=”xs:int”use=”optional”default=”1” />

</xs:complexType>

</xs:element>

</xs:sequence>

<xs:attributename=“ScheduledTransferPeriodInSeconds”type=”xs:int”use=”optional”default=”60” />

</xs:complexType>

</xs:element>

<xs:elementname=“PerformanceCounters”>

<xs:complexType>

<xs:sequencemaxOccurs=“unbounded”>

<xs:elementname=“PerformanceCounter”>

<xs:complexType>

<xs:attributename=“CounterName”type=”xs:string”use=”required” />

<xs:attributename=“SamplingRateInSeconds”type=”xs:int”use=”optional”default=”10” />

</xs:complexType>

</xs:element>

</xs:sequence>

<xs:attributename=“ScheduledTransferPeriodInSeconds”type=”xs:int”use=”optional”default=”60” />

</xs:complexType>

</xs:element>

</xs:sequence>

<xs:attributename=“StorageAccountConnectionString”type=”xs:string”use=”required” />

<xs:attributename=“OverallQuotaInMB”type=”xs:int”use=”optional”default=”4096” />

</xs:complexType>

</xs:element>

</xs:schema>

  • DiagnosticsConfig is the root element in the above schema. This element has the following attributes:

o StorageAccountConnectionString is a required attribute to specify the storage that is used for transferring the diagnostics data.

o OverallQuotaInMB is an optional attribute to specify the amount of local storage that is allocated for buffering the diagnostics data. The default value is 4 GB. To increase the default value:

Add the LocalStorage tag to ServiceDefinition.csdef.

Set sizeInMB to the required value for OverallQuotaInMB.

<LocalResources>

<LocalStoragename=”DiagnosticStore” sizeInMB=”8192″ cleanOnRoleRecycle=”false”/>

</LocalResources>

  • Directories is the first child. It configures the directories that contain the log files that are transferred to blob storage.

o ScheduledTransferPeriodInSeconds is an optional attribute to specify the timespan after which the transfer must take place. The default value is 60 seconds.

┬╖Directory is the child node. It represents each directory from which contents must be transferred.

ContainerName is a required attribute to specify the name of the blob container to which the contents of the directory is transferred.

LocalPath is a required attribute to specify the location of the directory in your role instances.

IsLocalPathRelative is a required attribute to specify whether the location of the directory is relative.

DirectoryQuotaInMB is an optional attribute to specify the maximum size of the local buffer that is used for transferring contents from this directory. The default value is 1 GB.

Note: The combined values of this attribute for all Directory nodes should not exceed the value specified for the OverAllQuotaInMB attribute defined in the root element.

  • PerformanceCounters is the second child. It configures the system performance counters whose values are to be captured and transferred to Table Storage.

o ScheduledTransferPeriodInSeconds is an optional attribute to specify the timespan after which the transfer must happen. The default value is 60 seconds.

┬╖PerformanceCounter is the child node. It represents system performance counter whose value is to be captured and transferred.

CounterName is a required attribute to specify the system performance counter name.

SamplingRateInSeconds is an optional attribute to specify the sampling frequency for the capture of the system performance counter value. The default value is 10 seconds.

The following is the sample content of an input configuration XML file with default attribute values.

<DiagnosticsConfigStorageAccountConnectionString=“UseDevelopmentStorage=true”>

<Directories>

<DirectoryContainerName=“wad-tomcat-logs” LocalPath=”apache-tomcatlogs” IsLocalPathRelative=”true” />

</Directories>

<PerformanceCounters>

<PerformanceCounterCounterName=“MemoryAvailable MBytes” />

<PerformanceCounterCounterName=“Processor(*)% Idle Time” />

</PerformanceCounters>

</DiagnosticsConfig>

It is not recommended to use default values for attributes. For example, in case of performance counters, the default value for ScheduledTransferPeriodInSeconds is 60 seconds and for SamplingRateInSeconds is 10 seconds. It means that after every 10 seconds, performance counters are collected and the collected counters are pushed to Windows Azure Table after every 60 seconds. While this seems fine, imagine running hundreds of compute instances, each pushing performance counters to Table Storage. Eventually this could become a storage and network bottleneck, and there’s a cost for transactions, albeit a small one (a penny per 10,000). From a polling standpoint, this could have a negative impact on the performance of instances running the diagnostics.

Following is the sample of the input configuration file with values assigned to ScheduledTransferPeriodInSeconds as 600 seconds and SamplingRateInSeconds as 120 seconds.

<DiagnosticsConfigStorageAccountConnectionString=“UseDevelopmentStorage=true”>

<Directories ScheduledTransferPeriodInSeconds=“600” >

<DirectoryContainerName=“wad-tomcat-logs” LocalPath=”apache-tomcatlogs” IsLocalPathRelative=”true” />

</Directories>

<PerformanceCounters ScheduledTransferPeriodInSeconds=“600” >

<PerformanceCounterCounterName=“MemoryAvailable MBytes” SamplingRateInSeconds =“120” />

<PerformanceCounterCounterName=“Processor(*)% Idle Time” SamplingRateInSeconds =“120” />

</PerformanceCounters>

</DiagnosticsConfig>

Specifying appropriate SamplingRateInSeconds ensures that diagnostics data is generated for the lesser number of times, which is sufficient to generate data required for analysis. For example, if performance counters are used for auto-scaling purposes and the auto-scale code only analyzes performance counters every 10-15 minutes, there’s no need to upload this data to storage more frequently.

Hosting DiagnosticsMonitor

ConfigureAzureDiagnostics is provided with the path of the input configuration XML file. ConfigureAzureDiagnostics maps the values in the input configuration XML file to add data sources for directories and performance counters in DiagnosticsMonitor as shown in the following code snippet.

Note: The type DiagnosticsConfig is auto-generated from the configuration XML schema file using xsd.exe.

Here’s a snippet of the relevant C# code in the .NET project.

XmlSerializer serializer = new XmlSerializer(typeof(DiagnosticsConfig));

DiagnosticsConfig diagConfig = null;

using (Stream fs = File.OpenRead(“DiagnosticsConfiguration.xml”))

{

diagConfig = serializer.Deserialize(fs) as DiagnosticsConfig;

}

DiagnosticMonitorConfiguration config = DiagnosticMonitor.GetDefaultInitialConfiguration();

config.OverallQuotaInMB = diagConfig.OverallQuotaInMB;

if (diagConfig.Directories.Directory != null)

{

foreach (DiagnosticsConfigDirectoriesDirectory dir in diagConfig.Directories.Directory)

{

DirectoryConfiguration directoryConfig = new DirectoryConfiguration();

directoryConfig.Container = dir.ContainerName;

if (dir.IsLocalPathRelative)

{

directoryConfig.Path = Path.Combine(Environment.CurrentDirectory, dir.LocalPath);

}

else

{

directoryConfig.Path = dir.LocalPath;

}

directoryConfig.DirectoryQuotaInMB = dir.DirectoryQuotaInMB;

config.Directories.DataSources.Add(directoryConfig);

}

config.Directories.ScheduledTransferPeriod = TimeSpan.FromSeconds(diagConfig.Directories.ScheduledTransferPeriodInSeconds);

}

if (diagConfig.PerformanceCounters.PerformanceCounter != null)

{

foreach (DiagnosticsConfigPerformanceCountersPerformanceCounter perf in diagConfig.PerformanceCounters.PerformanceCounter)

{

PerformanceCounterConfiguration perfConfig = new PerformanceCounterConfiguration();

perfConfig.CounterSpecifier = perf.CounterName;

perfConfig.SampleRate = TimeSpan.FromSeconds(perf.SamplingRateInSeconds);

config.PerformanceCounters.DataSources.Add(perfConfig);

}

config.PerformanceCounters.ScheduledTransferPeriod = TimeSpan.FromSeconds(diagConfig.PerformanceCounters.ScheduledTransferPeriodInSeconds);

}

DiagnosticMonitor.Start(CloudStorageAccount.Parse(diagConfig.StorageAccountConnectionString), config);

Running ConfigureAzureDiagnostics

The Windows Azure project must be appropriately configured so that the executable ConfigureAzureDiagnostics, dependencies of the executable, and the input configuration XML file DiagnosticsConfiguration.xml are added to the approot directory of the worker role of the Windows Azure project. Alternatively, these files may be retrieved from Blob Storage, assuming you upload them to Blob Storage beforehand. This would allow the diagnostics configuration app and configuration file to be updated independently of your code (and allow you to update without redeploying). In this example, we’ll assume they’re simply added to the Windows Azure project.

The following are the dependencies that are found in %ProgramFiles%Windows Azure SDKv1.6ref.

┬╖ Microsoft.WindowsAzure.Diagnostics.dll

┬╖ Microsoft.WindowsAzure.StorageClient.dll

The source code for the ConfigureAzureDiagnostics application is available at https://github.com/PersistentSys/cloudninja-for-java/tree/master/ConfigureAzureDiagnosticsTool.

The ConfigureAzureDiagnostics binary and related files are available at https://github.com/PersistentSys/cloudninja-for-java/tree/master/AzureDiagnosticsTool.

ConfigureAzureDiagnostics uses the mixed-mode assembly file Microsoft.WindowsAzure.ServiceRuntime.dll, which needs configurations for execution. The configurations are specified in an executable configuration file ConfigureAzureDiagnostics.exe.config, which is located in the approot directory and has the following content.

<?xmlversion=“1.0”?>

<configuration>

<startupuseLegacyV2RuntimeActivationPolicy=“true”>

<supportedRuntimeversion=“v4.0” />

</startup>

</configuration>

The following image illustrates the approot directory of the worker role. This directory contains ConfigureAzureDiagnostics and the related files along with the application binaries (HelloWorld.war).

Windows Azure

The ServiceDefinition.csdef of the Windows Azure project must be configured to run ConfigureAzureDiagnostics as a background task. To do this, modify the Startup node to include the Task node <Task commandLine=”util/.start.cmd ConfigureAzureDiagnostics.exe” executionContext=”elevated” taskType=”background”>.

<?xmlversion=“1.0”encoding=”utf-8″?>

<ServiceDefinitionname=“WindowsAzureProject1″xmlns=”https://schemas.microsoft.com/ServiceHosting/2008/10/ServiceDefinition”>

<WorkerRolename=“WorkerRole1″vmsize=”Small”>

<Startup>

<Task . . . >

</Task>

<TaskcommandLine=”util/.start.cmd ConfigureAzureDiagnostics.exe” executionContext=”elevated” taskType=”background”>

</Task>

</Startup>

</WorkerRole>

</ServiceDefinition>

After deploying the Windows Azure project, ConfigureAzureDiagnostics runs as a background task in the worker role instances. As a result, directory contents and performance counters are transferred to Windows Azure Storage.

Querying Table Storage to Retrieve Performance Counters

ConfigureAzureDiagnostics collects performance counters and persists them to Windows Azure Table named as WADPerformanceCountersTable. The following image illustrates sample data in this table.

windows azure1

We can query WADPerformanceCountersTable to retrieve performance counters for analysis.

The following code snippet retrieves performance counters generated in the last five minutes.

try {

String tableName = “WADPerformanceCountersTable”;

CloudStorageAccount storageAccount =

CloudStorageAccount.parse(storageConnectionString);

CloudTableClient tableClient =

storageAccount.createCloudTableClient();

Calendar currentTime = Calendar.getInstance();

// Create a filter for ‘timestamp less than current time’

String upperBound = TableQuery.generateFilterCondition(

TableConstants.TIMESTAMP,

QueryComparisons.LESS_THAN,

currentTime.getTime());

currentTime.add(Calendar.MINUTE, -5);

// Create a filter for ‘timestamp greater than (current time – 5 min)’

String lowerBound = TableQuery.generateFilterCondition(

TableConstants.TIMESTAMP,

QueryComparisons.GREATER_THAN,

currentTime.getTime());

// Combine both filters with AND operator which will result in filter

// selecting entities generated in last 5 minutes.

String filter = TableQuery.combineFilters(

upperBound, Operators.AND, lowerBound);

// Create a table query by specifying the table name,

// WADPerfCountersEntity as entity and the filter expression

TableQuery<WADPerfCountersEntity> query = TableQuery.from(

tableName, WADPerfCountersEntity.class).where(filter);

WADPerfCountersEntityResolver resolver =

new WADPerfCountersEntityResolver();

// Iterate over the results

for (WADPerfCountersEntity perfCountersEntity :

tableClient.execute(query, resolver)) {

System.out.println(“nCounterName :: “ +

perfCountersEntity.getCounterName()+ “nCounterValue :: “ +

perfCountersEntity.getCounterValue());

}

} catch (Exception e) {

e.printStackTrace();

}

In the above code:

  • storageConnectionString represents the connection string for Windows Azure Storage.
  • The WADPerfCountersEntity class extends TableServiceEntity and represents an entity (row) of WADPerformanceCountersTable.
  • WADPerfCountersEntityResolver implements EntityResolver and provides mapping between the entities that are retrieved from WADPerformanceCountersTable and WADPerfCountersEntity.

Looking more closely at the entity properties, you’ll see:

  • DeploymentId. This property is used to filter for a specific deployment. This is important when multiple apps are writing to the same diagnostics storage account.
  • Role. You may choose to collect counters for a specific role, which is very important when auto-scaling.
  • RoleInstance. Each running VM instance has its own set of counters. When auto-scaling, it’s often helpful to aggregate or average these values across all instances of a role.
  • CounterName. You may want to query for very specific counters and treat them separately.

Click here to access the sample code to retrieve performance counters from WADPerformanceCountersTable.

Diagnostics Storage Account

One subtle point about storing diagnostics data: Wherever possible, diagnostics data should be stored in its own storage account for several reasons:

  • You may want to grant access to a third-party monitoring service. When you hand out the storage account key, you provide access to the entire storage account. You do not want to expose customer data to third-party applications.
  • Storage accounts have specific transactional and bandwidth limits, which are published in this article. If your application has high-volume transactions and bandwidth against storage, it’s possible that adding additional transactions and bandwidth from diagnostics could cause a performance bottleneck.

Storage accounts are free to set up, so it doesn’t add any cost to create multiple storage accounts for a single application. Billing is specifically based on storage consumed, transactions executed, and bandwidth out of the datacenter.

Summary

In this article, we discussed how to configure diagnostics for Java applications. Diagnostics data is persisted centrally to Windows Azure Tables and Blobs, which can be retrieved for analysis. Diagnostics data helps in troubleshooting, performance measurement, resource usage monitoring, and so on.