Comparing Windows Azure Blob Storage and Google Cloud Storage

Welcome fans of cloud computing.
Take a look at the comparison of Windows Azure Blob Storage and Google Cloud Storage (in this case the author does not forget to mention about Amazon AS3).

I thought it would be nice to write an article comparing storage with Google App Engine and Windows Azure. In this article we compare the Windows Azure Blob Storage and Google Cloud Storage.

the First part of a cycle — Comparing of Windows Azure Table Storage and Amazon DynamoDB
the Second part of a cycle — Comparing of Windows Azure Blob Storage and Amazon Simple Storage Service (S3) — part I
the Third part of a cycle — Comparing of Windows Azure Blob Storage and Amazon Simple Storage Service (S3)–Part II, summary
Abbreviations: Windows Azure Blob Storage - WABS and Google Cloud Storage GCS Amazon S3AS3.

Conceptually WABS and GCS provide the same functionality – in other words, both systems are cloud-based file system that can store large amounts of unstructured data (usually in files).

Both systems provide a REST API for working with files and folders and other libraries of high-level languages, which are usually wrappers of the REST API. Each release of the API has its version, in WABS it has the date value in GCS – figures. At the moment of writing of article the version of WABS was equal to 2011-08-18, GCS — version 2.0.

Similar functionality in the two systems:

the
    the
  • Both systems are cloud-based file systems with two-level hierarchy.
  • the
  • Both systems allow you to store large amounts of data long and cheaply.
  • the
  • Both systems allow you to protect content from unauthorized access.
  • the
  • Both systems provide the access control mechanisms to protect data. In GCS is ACLs and Query String Authentication in WABS — ACLs and Shared Access Signatures.
  • the
  • Both systems allow you to store any number of versions of the original object, but the versioning mechanism in the two systems differs.


the Concept



Before we talk more about these two services, I think it is important to clarify some concepts. If you are familiar with basic concepts of WABS and GCS, you can skip this section.

the Containers and blobs basket: If these services – file systems in the cloud, consider the container blobs WABS and a basket of GCS as a folder or directory. In the account vault or WABS account GCS you can have zero or more BLOB containers and buckets, which can contain blobs or objects respectively.

Comments:

the
    the
  • Such thing as nested containers or blobs basket, no. Both services provide a two-level hierarchy without nesting. However, both systems can create the illusion of folder hierarchy using prefixes.
  • the
  • Restrictions on the number of containers and baskets.
  • the
  • Both systems provide the ability to log resource requests – this function is called in GCS “logging”, in WABS – “Storage Analytics”. The difference is that in the GCS logging works at the level of the basket, in WABS – in the account level of the storage. In the GCS data logging placed in a separate user-defined basket, in WABS – in predefined tables and containers that are created automatically when you enable logging.


the Blobs and objects: Blobs WABS and GCS objects are files in your cloud file system located in the container blobs and baskets.

Comments:

the
    the
  • has no limit to the number of stored blobs and objects in GCS is the number simply not known, in WABS is the number limited by the size of the storage account (100 TB).
  • the
  • Maximum size of an object in WABS is 1 TB, in GCS – is not defined.
  • the
  • Both systems provide rich functionality for managing blobs and objects. You can copy, upload, download and perform other operations.
  • the
  • Both systems provide the ability to protect content from unauthorized access, and the mechanism of access control lists in more detail the custom in GCS, where each file in the basket you can create your own ACL. In WABS all happens at the container level blobs.


Two of the most important functions is uploading and downloading, let's discuss them first, then compare the other functions.

Uploading blobs and objects



Let's talk about downloading blobs and objects in containers and baskets. There are two mechanisms of loading – it is possible to load BLOB or object completely within one request or to divide them into pieces (units or pages in WABS, in GCS they had no special name).

Download in a single request


If the data to be downloaded are small in size and you have a good connection speed, you can load these data in a single request. In WABS is used for this Put Blob. In GCS- PUT Object or POSTObject.

Download pieces


You can divide the big data that is inefficient to load in one request completely. Both systems allow you to break the data into pieces (units or pages in WABS, in GCS they had no special name) and to load gradually. In WABS for block blobs, you must use PutBlock and Put Block List, for page — Put Page. In the GCS to do this, use the functions POST Object PutObject.

There are many reasons why you can decide whether to load the data pieces:

the
    the
  • Needs to load very large data. Pay attention that in WABS one block BLOB limited to the size of 200 GB, page – 1 TB, in GCS can also have one object to 5 TB. Such volumes are impractical to load in a single request.
  • the
  • Low speed connection.
  • the
  • Both systems are cloud services designed to handle requests of hundreds and thousands users simultaneously, and both systems will restrict your queries if they run beyond the statutory limit – in WABS it is 10 minutes to download 1 MB of data.
  • the
  • Split large data into pieces allows you to execute parallel loading (or, more quickly load data).
  • the
  • In case of an open load piece you can repeat the download, if will break the loading of large data loading in a single query have to download all over again, which is inefficient.
  • the
  • the limitations of the system – WABS does not allow in a single request to download data if their size exceeds 64 MB.


Let's see how to load the data pieces in each of the systems. For example, you want to upload pieces of a file size of 100 MB.

WABS


For example, each piece has a size of 1 MB (in spite of the fact that there is no need to have pieces of the same size) – you need to make loading of 100 pieces. Take a block BLOB, the blocks (pieces) of which has a unique identifier (BlockId). To download it use the function PutBlock. BlockId – a string, Base64-encoded, the maximum size of which is limited to 64 bytes. All BlockId (100 in our case) must be of the same length. It does not matter what order you load the blocks – you can load them in parallel. After loading the block WABS puts it somewhere in the repository, and stores 7 days. After downloading all the blocks called Put Block List confirming (commit) those blocks. Until you call this function, the BLOB can not handle and if you have not confirmed blocks during the 7 days, they will be deleted by the system. After calling the function, based on the list order BlockId, WABS will recreate the BLOB and mark it as available. No matter what the values would have BlockId (they can all be GUID), but it is important in which order you will be sent a list of BlockId when using Put Block List.
Limitations:

the
    the
  • BLOB can be divided into a maximum of 50000 blocks.
  • the
  • BLOB can have a maximum of 100,000 uncommitted blocks at any given time.
  • the
  • the Set of uncommitted blocks cannot have a size larger 400 GB.
  • the
  • All BlockId of units one BLOB should be of the same length, i.e., it is unacceptable that they are equal to block8, block9, block11.
  • the
  • BlockId Maximum length is 64 bytes.


GCS


In the case of GCS, the download of a large file chunks called “Resumable Uploads”. First, you need to inform GCS that you started the download process, call POST Object. Usually this function is used to download a file using HTML forms but in this case you do not specify a file. You can specify request headers by which to inform GCS that you started the download process. After loading is complete, the GCS will return a response that contains the Upload Id that uniquely identifies the boot process. This Id need to save, as you will need it when uploading pieces. Next, you need to try to load the file, using theObject Put and passing it the Upload Id and the contents of the object. If successful, the GCS will respond with HTTP code 200 Ok, but if the operation fails, you will have to request from the GCS the number of downloaded bytes. GCS will return HTTP code 308 Resume Incomplete. Then you can continue loading data using Put Object.

Thoughts:

the
    the
  • I think we can get rid of the first call to the return object of the function in which we try to load the entire file in hopes of getting a code 200 OK. If I try to upload the file to 100 MB, I'm pretty sure that it will not boot in one go. Instead of trying to download this whole file, I can skip the first two steps and just download the piece of the file, get its status, and then restart it or load the next piece.
  • the
  • I'm not sure how you can simultaneously download chunks in GCS when GCS when you query the number of loaded bytes returns the Range header. In WABS there is a BlockId in AS3 part number, which facilitates the task of parallel loading.


Downloading of blobs and objects



Let's see how you can download blobs and objects. For this there are two ways – either download the entire BLOB or object in a single query, or pieces.

Each system has only one function for downloading — Get Blob WABS and GET Object in the GCS.

Download in a single request


If data has a small size and at you good speed of connection, you could download object completely, using Get Blob WABS and GET Object in the GCS.


pieces Download


If the object is large and you're not sure whether to download it all at once, you can swing the pieces, using the same function with the addition of a Range header and determining the range of bytes it takes to download.

The downloading process:

    the
  1. to Determine the size of the object. For example, he weighs 100 MB.
  2. the
  3. to Determine the size of the pieces. For example, you prefer to download chunks of 1 MB.
  4. the
  5. Call Get Blob or Get Object and pass them the appropriate values in the Range header. If you pump consistently, your first request will have the value of this header to “0 – 1048575” (0 – 1MB), the second request— “1048576 – 2097151” (1 – 2 MB) and so on.
  6. the
  7. After the download put the piece anywhere.
  8. the
  9. After downloading all the pieces to create an empty file of 100 MB and fill this file downloaded pieces.


Common elements between WABS, AS3 and GCS



All three systems have common elements, for example:

the

    All three systems are cloud-based file systems with two-level hierarchy.

    All three systems provide two-level hierarchy (recycle/objects in AS3/GCS and containers blobs/blobs in WABS).

    All three systems provide a RESTful interface to interact with their own services and libraries high-level languages, which are usually wrappers of the REST.



Common with AS3



When I first read about the GCS, I found that there are many common GCS and AS3, for example:

the

    One terminology: both systems use similar terminology such as bucket and object (in WABS they are called container blobs and blobs).

    the Same operation names: both systems use the same name operations. For example, the function from the API that returns a list of baskets, in both systems, called GET Service. the

  • the Same pricing structure: both systems have similar pricing structure. In WABS all transactions are the same, AS3 and GCS have the same transaction cost, which varies depending on the running operation.
  • the
  • One style of hosting: both support virtual-hosted-style (for example, http://mybucket.s3.amazon.com/myobject) and path-style (for example, http://s3-eu-west-1.amazonaws.com/mybucket/myobject)whereas WABS supported only path-style (for example, http://myaccount.blob.core.windows.net/myblobcontainer/myblob).
  • the
  • Similar model of consistency: Both systems provide a similar consistency model. For example, both systems provide a model of sustainability strong read-after-write for all PUT requests and a model, sustainable in the long run for all the List operations (GET).


Unique moments in GCS



When we began to discuss the main functionality of GCS could seem that GCS offers less functionality than WABS and AS3, however, the GCS has some features not found in any other platform. For example:

the
    the
  • OAuth 2.0 authentication: This is a unique and modern feature that eliminates the need to provide credentials for users and applications when they need to access the data. Read more: https://developers.google.com/storage/docs/authentication#oauth.
  • the
  • Cookie-based authentication: the GCS allows you to make authenticated in the browser requests (for those who have no account GCS). To do this, configure an ACL and give the user the URL of the object. Read more: https://developers.google.com/storage/docs/authentication#cookieauth.
  • the
  • Cross — Origin Resource Sharing (CORS): Another unique and contemporary feature available only in GCS. Specification CORS developed by W3C, is the policy applied to applications running on the client side to prevent interactions between resources from different origin (implementation of same-origin policy). However, this feature prevents not only dangerous behavior, but it is useful and legitimate interactions between known origins. GCS supports this specification by allowing you to configure the recycle bin to return CORS-compliant responses. Read more: https://developers.google.com/storage/docs/cross-origin. Please note that the function is being “Experimental” (in other words, in the beta :)). I'm not 100% sure, but it seems the same can be achieved by using a container of blobs $root in WABS.


Pricing



If both systems are no capital costs. The pricing model is relatively simple and based on consumption. Both systems are billed based on usage and it may consist of three components:

    the
  1. Number of transactions: the Payment is made according to the number of completed transactions – roughly speaking, one transaction is one function call in the system. There is a significant difference between the two systems – WABStransaction cost is fixed ($0.01 for 10000 transactions), in GCSit varies depending on the type of transaction. If you are doing transactions PUT, COPY, POST, LIST, you pay a big price for transaction ($0.01 for 1000 transactions), for GET and others pay a smaller ($0.01 for 10 000 transactions). Requests for deletion is not written, but assume they are in free GCS.


    the
  1. Traffic: You pay for the amount of data transferred in and out of the system. At the time of writing this post, both systems provide free incoming bandwidth. Not mentioned, paid the cost of data transfer within the same datacenter in GCS.


Also available special pricing model and the two systems provide different packages of payment. Read more about pricing — https://www.windowsazure.com/en-us/pricing/details/ for WABS and https://developers.google.com/storage/docs/pricingandterms for GCS.

Options



In table summarized the functions provided by WABS and GCS. It contains only support both systems function.
the the the the the the the the the the the the the the
WABS
GCS
Create Container/Bucket PUT
Yes
Yes
List Containers/GET Service
Yes
Yes
Container Delete/DELETE Bucket
Yes
Yes
List Blobs/GET Bucket (List Objects)
Yes
Yes
Set Container ACL/PUT Bucket (ACL or CORS)
Yes
Yes
Get Container ACL/Get Bucket (ACL or CORS)
Yes
Yes
Put Blob/PUT Object
Yes
Yes
POST Object
No
Yes
Get Blob/GET Object
Yes
Yes
Delete Blob/DELETE Object
Yes
Yes
Copy Blob/PUT Object
Yes
Yes
Get Blob Properties/HEAD Object
Yes
Yes
Get Blob Metadata/HEAD Object
Yes
Yes

The following table shows the list of functions supported in WABS.
the the the the the the the the the the the the the the
WABS
GCS
Set Blob Service Properties
Yes

Get Blob Service Properties
Yes

Set Container Metadata
Yes

Get Container Metadata
Yes

Set Blob Properties
Yes

Set Blob Metadata
Yes

Snapshot Blob
Yes

Lease Blob
Yes

Put Block
Yes

Put Block List
Yes

Get Block List/Parts List
Yes

Put Page
Yes

the Get Page Ranges
Yes


Let us consider these functions.
the WABS
GCS
the Create Container/Bucket PUT
Yes
Yes

This function creates a new BLOB container or basket.

An important point to keep in mind is that the containers of the blobs are limited to the account of storage whereas baskets of GCS are limited to the GCS project. When you create an account store WABS, you define its location (data center), your containers and blobs are located in specific data centers in specific geographic locations. When you create a basket in GCS, you can determine the region in which to create this basket, so you can distribute the baskets on all data-centers in GCS if there is such a need. In order to do the same in WABS, you need to create the storage account in each datacenter in which you want to place the containers.

There are a few rules of naming containers, blobs and baskets, they are summarized in the table below.
the the the the
WABS
GCS
min/max length name
3/63
3/63
case Sensitivity
Lower case
Lower case
Allowed characters
Alphanumeric and hyphen (-)
Alphanumeric, hyphen (-) and dot (.)

More rules for naming:
the
    the
  • container Names of the blobs must begin with letters or numbers, but not with the hyphen after hyphen should be a letter or digit, multiple consecutive dashes are not allowed.
  • the
  • the names of the baskets, GCS must consist of labels separated by a point where each label should begin and end with a lowercase letter or a digit, and the name of a basket should not look as the IP address (e.g. 127.0.0.1).
  • Despite the fact that the names of the baskets can contain from 3 to 63 characters; if the name contains dots, then the name of the recycle bin can be up to 222 characters, given the number of points. the

  • the names of the baskets cannot begin with the prefix goog.


Notes:

the
    the
  • When creating a container or basket you can set the ACL (optional) if not specified, the container or basket be private, i.e. accessible only to the owner. In GCS when creating you cannot define an ACL when you create a System Default ACL is used. Read more: https://developers.google.com/storage/docs/accesscontrol#default. To change an ACL or to use CORS in the trash can after it is created.
  • the
  • WABS allows you to define custom metadata for a container, which are collections of values of key-value and have a maximum size of 8 KB. In GCS is not available.

the the
WABS
GCS
List Containers/GET Service
Yes
Yes


The function returns a list of all BLOB containers or baskets that belong authentifizierungscode owner in GCS.

Comments:

the
    the
  • a Single call to this function in WABS will return a maximum of 5,000 containers, if they are in the account store more, it will be also returned continuation token. By default WABS will return up to 5,000 containers, but you may specify fewer. In GCS is not mentioned.
  • the
  • IN WABS, you can make filtering on the server side using the prefix to begin the names of the containers that fall into the sample.
  • the
  • IN WABS, you can specify whether to return the metadata for the container blobs.

the WABS
GCS
the Container Delete/DELETE Bucket
Yes
Yes

The function removes the blobs container or basket.

Comments:

the
    the
  • May look like that this looks like a synchronous operation, in reality she is not. When you send a request to remove the container's blobs, it is marked for deletion and is not available, then removed in the garbage collection process, so the real time removal of the container may vary depending on the size of the data in this container. In my experience, deleting a very large container can take hours, and during this time an attempt to create a container with the same name will result in an error (Conflict Error – HTTP 409). In this connection, you need to plan what to do at this time.
  • the
  • In GCS shopping cart must be empty before deleting. you must First remove all objects from the recycle bin, then delete it. Otherwise it will be returned error 409 Conflict.

the the
WABS
GCS
List Blobs/GET Bucket (List Objects)
Yes
Yes

Function is used to retrieve a list of blobs within the container or cart. Functions the systems perform the same, given:

the
    the
  • Both functions allow you to limit the resulting sample of the desired number of objects.
  • the
  • Both functions have a maximum number of objects they can return in a single function call in the 5000 WABS, in GCS – 1000.
  • the
  • Both functions support separators which are a symbol grouping of the blobs or objects. The most used delimiter is /. As mentioned above, both systems support two-level hierarchy, and the use of a separator can create the illusion of the type hierarchy of folders. For example, you have the following objects: images/a.png, images/b.png, images/c.png, logs/1.txt, logs/2.txt, files.txt. When you want to call the function and pass it the delimiter /, both systems will return the following values: images, logs, files.txt.
  • the
  • Both functions support filtering on the server side using prefixes. When your query has a prefix, both systems will return objects that have a name starting with this prefix. Using the example above, if we transfer the prefix “images” without the separators, both systems will return the following values: images/a.png, images/b.png, images/c.png.
  • the
  • Both functions can use the token, which is, in fact, a continuation token, used to indicate both operating systems, you need to start to get a list of objects starting with this token.
  • the
  • Both systems return objects in alphabetical order.


Differences:

the
    the
  • Single function call in WABS will return a maximum of 5000 blobs, GCS – 1000 objects.
  • the
  • When listing you can specify WABS that it is necessary to return also snapshots of blobs. In GCS it can be done.
  • the
  • When listing you can specify WABS that it is necessary to return the metadata for the blobs. In GCS metadata for the objects are not returned – it is necessary to use a HEAD Object.
  • the
  • When listing you can specify WABS that it is necessary to return the list of blobs that have not yet committed (commited), i.e., partially loaded, GCS could return only those objects that are already fully loaded.
  • the
  • you Can use this function to get the ACL or CORS configuration for a bucket.

the the
WABS
GCS
Set Container ACL/PUT Bucket (ACL or CORS)
Yes
Yes

Function is used to specify the ACL for containers or baskets, and in WABS it is also possible to specify one or more access policies. In GCS can also be configured for CORS (but you can't configure CORS and the ACL in one request).

For the container blobs ACL values can be:



For basket values of ACL can be equal to:

the
    the
  • READ: Allow to retrieve a list of objects in the basket.
  • the
  • WRITE: Allow create, overwrite, and delete objects in the basket.
  • the
  • FULL_CONTROL: this value grants permission to READ, WRITE.


Easy to GCS is that you can give users different sets of permissions, for example, user1 could to have READ ACL, user2 – WRITE ACL, in WABS of such flexibility, no resolution shall be put only on the container blobs.

Easy to WABS is that, in addition to the ACL, you can specify up to 5 access policies on the container that define a temporal permission set to this container. For example, create access policy with write permission on a container of blobs, which will operate only the day. The use of policies allows you to generate a special URL with the signature and give it to the users (functionality flexible Shared Access Signatures). Signatures allow you to give access rights to containers and blobs at a more granular level for a certain time.
the the
WABS
GCS
Get Container ACL/GET Bucket (ACL or CORS)
Yes
Yes

The function is used to retrieve the ACL for a BLOB container or baskets, and in WABS, this function also returns the access policy defined for the container.

For obtaining of ACL of a basket, you need to call GET Bucket with string parameter “acl”, to obtain the CORS same – with string parameter “cors”. If neither that nor another, it returns a list of objects in the basket.

the the
WABS
GCS
Put Blob/PUT Object
Yes
Yes

The function adds the BLOB to the BLOB container and the object in the basket. This function can be used to specify the ACL of an existing object in GCS or copy an object from one basket to another.

Comments:

the
    the
  • In both systems, the function will overwrite an existing object with the specified name.
  • the
  • Both systems allow you to define properties for the objects (cache control, content type, etc.)
  • the
  • Both systems allow you to send the MD5 content hash to check the data consistency.
  • the
  • In GCS is possible when you create the object to set ACL on it that you can't do in WABS.
  • the
  • Both systems allow you to specify metadata for blobs and objects in the form of a collection of pairs key-value. In WABS the maximum size of metadata – 8 KB in GCS is unknown.
  • the
  • When you create a page BLOB using this function, you only initiate a page BLOB, but don't put data into it. To insert data, you must use the function Put Page.
  • the
  • With POPs Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

mSearch: search + filter for MODX Revolution

Emulator data from GNSS receiver NMEA

The game Let's Twist: the Path into the unknown