Azure and blob write performance

by ingvar 6. januar 2011 21:07

Edit - 22nd August

Please read the comment below by Joe Giardino regarding the relative poor performance of the OpenWrite method. He has a very good explanation!

Introduction

During my work at Composite on C1 I found out that, some ways of adding/uploading data to the Azure blob is faster than others. So I did some benchmarking on ways to add data to a block blob. I will start by listing the results and below you can see the code i used to do the testing. In all tests the loopCount was 50. See the code below on more information on the loopCount. The numbers in the table is the average of milliseconds of these 50 loops.

Results

Azure test results

As expected the blob is a little slower than writing to the local disk. But what surprised me is that the OpenWrite method is much much slower than the other methods for adding data to the blob. Unfortunately I started out using the OpenWrite method and used it a lot. This really slow down my solution. It got so slow I started getting ThreadAboutException's and time time outs.

Size in kb Local disk UploadByteArray UploadFile UploadFromStream OpenWrite
50 0 31
64
33 222
100 0 41 33 37 240
150 0 42 40 43 235
200 0 50 43 44 227
250 0 56 52 44 227
300 0 55 53 51 242
350 57
55 79 53 246
400 57 60 90 60 263
450 55 72 95 61 258
500 50 76 73 68 278

 

Local dev fabric test results

I included this because a made it work locally and then deployed on Azure. The numbers just reflect the numbers from the Azure run.

Size in kb Local disk UploadByteArray UploadFile UploadFromStream OpenWrite
50 0 89 78 80 338
100 0 90 86 89 398
150 0 128 124 129 411
200 0 138 137 136 426
250 0 144 151 148 436
300 0 179 178 185 521
350 19 197 193 192 561
400 16 234 230 229 550
450 14 245 249 245 541
500 14 248 247 249 555

 

The test code

Base code for all tests

Just finding the path of a local test file and creating the buffer to write to local disk or block blob.

string localPath = Path.Combine(
    HttpContext.Current.Request.PhysicalApplicationPath, "BlobTestFile.dat");

if (File.Exists(localPath)) File.Delete(localPath);

/* Setting up the buffer */
byte[] testBuffer = new byte[testBufferSize];
for (int i = 0; i < testBufferSize; i++)
{
    testBuffer[i] = (byte)(i % 256);
}

Base code for all blob tests

CloudStorageAccount account =
   CloudStorageAccount.FromConfigurationSetting("BlobConnectionString");

CloudBlobClient client =
   account.CreateCloudBlobClient();

CloudBlobContainer container =
   client.GetContainerReference("mycontainer"); /* Remember lower casing only */

container.CreateIfNotExist();

Local disk writes test code

Simply using the System.IO.FileStream class for writing the buffer to disk.

int diskWriteTime1 = Environment.TickCount;
for (int i = 0; i < loopCount; i++)
{
    using (FileStream fileStream = new FileStream(localPath, FileMode.Create))
    {
        fileStream.Write(testBuffer, 0, testBufferSize);
    }
}
int diskWriteTime2 = Environment.TickCount;

UploadByteArray test code

Uploading the testBuffer using the UploadByteArray method.

int blobUploadByteArrayTime1 = Environment.TickCount;
for (int i = 0; i < loopCount; i++)
{
    CloudBlob blob = container.GetBlobReference("BlobTestFile.dat");
    blob.UploadByteArray(testBuffer);
}
int blobUploadByteArrayTime2 = Environment.TickCount;

UploadFile test code

Uploading the file written by the local disk write test. The local file has the same size as the testBuffer.


int blobUploadFileTime1 = Environment.TickCount;
for (int i = 0; i < loopCount; i++)
{
    CloudBlob blob = container.GetBlobReference("BlobTestFile.dat");
    /* Reusing the local file, written in the local test*/
    blob.UploadFile(localPath);
}
int blobUploadFileTime2 = Environment.TickCount;

UploadFromStream test code

Uploading the file written by the local disk write test using a FileStream for reading the file. The local file has the same size as the testBuffer.


int blobUploadFromStreamTime1 = Environment.TickCount;
for (int i = 0; i < loopCount; i++)
{
    CloudBlob blob = container.GetBlobReference("BlobTestFile.dat");

    using (FileStream fileStream = new FileStream(localPath, FileMode.Open))
    {
        /* Reusing the local file, written in the local test*/
        blob.UploadFromStream(fileStream);
    }
}
int blobUploadFromStreamTime2 = Environment.TickCount;

OpenWrite test code

Uploading the testBuffer using the OpenWrite method.

int blobOpenWriteTime1 = Environment.TickCount;
for (int i = 0; i < loopCount; i++)
{
    CloudBlob blob = container.GetBlobReference("BlobTestFile.dat");

    using (Stream stream = blob.OpenWrite())
    {
        stream.Write(testBuffer, 0, testBufferSize);
    }
}
int blobOpenWriteTime2 = Environment.TickCount;

Tags:

.NET | Azure | Blob | C#

Comments (6) -

Joe Giardino
Joe Giardino United States
22-08-2011 19:38:44 #

The reason that OpenWrite appears slower is that it is targetted at large blob Sizes. Essentially OpenWrite returns a stream object that buffers data until a given block size is reached, and then pushes that data the server. Once the stream is closed a put block list is peformed. For small blobs this makes no sense as you pay the prebuffering cost and the put block list cost.  Internally the upload* methods check a threshold to decide if the stream approach should be used ( default is 32 mb). In larger blobs the stream is actually much faster as it can dispatch many simultanous uploads in parallel.

I think the answer here is to use the correct method for the scenario. If you don't want to worry about it then use the uplaod* methods and the internal thresholds will do this for you.

ingvar
ingvar United States
22-08-2011 20:12:17 #

@Joe Giardino, thanks for that excellent explanation! It seems that I should have tested with larger files to be fair to the OpenWrite method. Ill have a look at it again soon!

David
David Italy
28-05-2012 04:05:17 #

Hello Ingvar,  move the 'for loop' inside the 'using', and try to write a chunk of data (50K should be fine), instead of one byte after the other ;)

ingvar
ingvar Denmark
31-05-2012 05:47:09 #

Hi David, the point of having the 'for loop' outside is to get 'loopCount' samples and then calculate the averange like this: (blobOpenWriteTime2 - blobOpenWriteTime1) / loopCount.

If this was 'production' code you would be very correct by moving the 'for loop' inside the 'using'. But this code is for testing performance Smile

Andrei
Andrei Canada
23-07-2012 08:48:08 #

Hi Ingvar,

Thank you for the article, just what I was looking for.
It would be interesting to additionally see two things.
How much it takes for 1mb, 10mb, and 100mb files, as it appears that ms/kb improves with bigger uploads.

Additionally how it behaves with concurrent writes.

I may end up doing the two things above, and if I do ill post you the results

-Andrei

ingvar
ingvar Denmark
24-07-2012 20:12:40 #

Hi Andrei,

Yes, bigger files will improve the overhead. Though, not sure how much Smile

Concurrent writes is also a very good option for improving performance!

Hope you get around to do those tests! I think the results are going to be interesting Smile

Martin

Pingbacks and trackbacks (1)+

About the author

Martin Ingvar Kofoed Jensen

Architect and Senior Developer at Composite on the open source project Composite C1 - C#/4.0, LINQ, Azure, Parallel and much more!

Follow me on Twitter

Read more about me here.

Read press and buzz about my work and me here.

Stack Overflow

Month List