Sunday, January 10, 2010

run your own SQS speed-test

A tweet by Shlomo Swidler caught my interest, this morning:
@guyro Using typica library I was able to get 12 SQS enqueues per second per thread.

This caught my interest, as in jclouds, we claim we have decent performance. Now's the time to put it to the test.

In order to prepare, I had some homework to do. Firstly, we hadn't yet implemented SQS. After about 3 hours, I created the new client (not 100% finished, but enough to start).

Next, I needed to setup a test case, which was sure to get top performance. For this, I completely turned off logging and switched to our asynchronous interface. Later in this blog, I'll show you how this test was coded. For now, let's run it!

Here's what you need to do:
  1. Setup an instance of ec2 running ubuntu to make it easy (ami-7e28ca17)
  2. Install java (apt-get install -y openjdk-6-jdk)
  3. Download the test jar (wget http://jclouds.googlecode.com/files/jclouds-speedtest-sqs.jar)
  4. Run the test (java -Xms128m -Xms128m -jar jclouds-speedtest-sqs.jar id secret queueName 250)


Here's an example of the output I received from an m1 small instance in the us-east region:

ubuntu@ip-10-242-197-159:~$ java -Xms128m -Xms128m -jar jclouds-speedtest-sqs.jar id key queueName 250
creating queue: queueName in region eu-west-1
creating queue: queueName in region us-east-1
creating queue: queueName in region us-west-1
COMPLETE: context: default, region: us-east-1, rate: 70.741370 messages/second
pausing 5 seconds before the next run
COMPLETE: context: default, region: eu-west-1, rate: 82.372323 messages/second
pausing 5 seconds before the next run
COMPLETE: context: default, region: us-west-1, rate: 104.777871 messages/second
pausing 5 seconds before the next run
deleted queue: queueName in region us-east-1
deleted queue: queueName in region us-west-1


Now, look at the performance from a c1 medium instance in the us-east region:

ubuntu@ip-10-244-243-82:~$ java -Djclouds.enterprise -Xms128m -Xms128m -jar jclouds-speedtest-sqs.jar id key testq 5000
creating queue: testq in region eu-west-1
creating queue: testq in region us-east-1
creating queue: testq in region us-west-1
COMPLETE: context: enterprise, region: us-west-1, rate: 487.092060 messages/second
pausing 5 seconds before the next run
COMPLETE: context: enterprise, region: us-east-1, rate: 720.357297 messages/second
pausing 5 seconds before the next run
COMPLETE: context: enterprise, region: eu-west-1, rate: 579.642940 messages/second
pausing 5 seconds before the next run
deleted queue: testq in region eu-west-1
deleted queue: testq in region us-east-1
deleted queue: testq in region us-west-1

Crazy difference, huh?! Unsatisfied, I turned up the volume to the highest. Here, I use the c1 xlarge instance and have to raise my file limit in preparation for tons of i/o.

root@domU-12-31-39-0C-A9-81:~# ulimit -n 10240
root@domU-12-31-39-0C-A9-81:~# java -Djclouds.enterprise -Xms256m -Xms256m -jar jclouds-speedtest-sqs.jar id key testq1 5000
creating queue: testq1 in region eu-west-1
creating queue: testq1 in region us-east-1
creating queue: testq1 in region us-west-1
COMPLETE: context: enterprise, region: eu-west-1, rate: 662.866234 messages/second
pausing 5 seconds before the next run
COMPLETE: context: enterprise, region: us-east-1, rate: 1594.387755 messages/second
pausing 5 seconds before the next run
COMPLETE: context: enterprise, region: us-west-1, rate: 1252.191335 messages/second
pausing 5 seconds before the next run
deleted queue: testq1 in region eu-west-1
deleted queue: testq1 in region us-east-1
deleted queue: testq1 in region us-west-1


The above performance is obtained with the standard java cached Executor and could probably be tuned to do much better.

I'd like to see results from different sizes and launched from different regions. Please let me know, if you get interesting results.

Dirty Details


Completely switching off logging isn't something I tend to recommend. However, if you want to see how we do it, here's what that code looks like:

context = SQSContextFactory
.createContext(System.getProperties(), accesskeyid, secretkey,
new NullLoggingModule());

Here, you'll see us configuring the SQS client, and also ensuring that we can override defaults using normal java system properties. The last bit overrides the normal logging system to essentially no-op.

Now, to make the test efficient, we need to use the asynchronous interface. This makes our tests more efficient, as it allows jclouds to manage threads in the best way possible. Here's how it looks to fire off a bunch of messages using the async interface:

// fire off all the messages for the test
Set responses = Sets.newHashSet();
for (int i = 0; i < messageCount; i++) {
responses.add(context.getAsyncApi().sendMessage(queue, message));
}

Here, we collect the responses from the firing. This will take less than a second to fire a few hundred messages. However, they are all in-progress and are now limited by processing, memory and network contention.

The next part we shuffle through and check the responses until they are all done, or we ran out of time.

do {
Set retries = Sets.newHashSet();
for (ListenableFuture response : responses) {
try {
response.get(100, TimeUnit.MILLISECONDS);
complete++;
} catch (ExecutionException e) {
System.err.println(e.getMessage());
errors++;
} catch (TimeoutException e) {
retries.add(response);
}
}
responses = Sets.newHashSet(retries);
} while (responses.size() > 0 &&
System.currentTimeMillis() < start + timeOut);



The rest of the code is normal stuff. Feel free to check out the real deal here.

I hope you enjoy this demonstration, and please do tweet me, if you have other ideas!

Saturday, January 9, 2010

asynchronous workflows in jclouds

Those of you who know jclouds also know that we have a focus on asynchronous calls. Asynchronous methods return a deferred result, which is called a Future in Java. Dealing with futures, you can kick off a whole bunch of long running commands in what appears to be a single thread of control. Then, you can check on the status later. Here's an example of how to create 100 blobs in a blobstore and then check later.

// get a reference to an async blobstore like Amazon S3
AsyncBlobStore blobStore = context.getAsyncBlobStore();

// a container to hold the deferred results
Set etags = Sets.newHashSet();

// fire off 100 sheep in parallel
for (int sheepCount=1; sheepCount<=100; sheepCount++) {
Blob sheep = blobStore.newBlob("sheep" + sheepCount);
sheep.setPayload("my number is " + sheepCount);
etags.add(blobStore.putBlob("meadow", sheep));
}

// let's wait here until they are all done.
for (Future etag : etags)
// this will throw an exception if there was a problem
etag.get(2, TimeUnit.SECONDS);

Truly asynchronous workflows are more difficult. For example, you might want to notify someone when your server is created, or rerender a movie you downloaded so it fits in your ipod.

Sadly, the Java Future object has very little to offer for scenarios like these. It doesn't have a hook to chain events. However, jclouds uses an alternate means compliments of google's guava library: ListenableFuture.

ListenableFutures can be chained together such that when the first command completes, something else takes place without you having to stick around to ensure it happens. With this, you can do fancy things like the following:

// get a reference to an async blobstore like Microsoft Azure
AsyncBlobStore blobStore = context.getAsyncBlobStore();

// let's export our whole company
Blob myEntireCompany = blobStore.newBlob("myentirecompany");
myEntireCompany.setPayload(hugeAmountOfData);

// using the async interface, we always get deferred results
ListenableFuture etag = blobStore.putBlob("isleofman", myEntireCompany);

// publish a message via rabbitMQ AMQP when the big push is done
etag.addListener(new Runnable() {
public void run() {
channel.basicPublish(exchangeName, routingKey, null,
"hire a fancy tax accountant".getBytes());
}
}, sameThreadExecutor());

// notice I don't have to wait here.. the publish is now going
// to magically invoke itself when the upload is finished.

In our development branch of jclouds (trunk), every service returns ListenableFutures. Be sure to let us know what you do with them!

Wednesday, December 30, 2009

jclouds, ec2, and alestic: switch on the lamp

alestic has been improving the Amazon EC2 world: providing great images, advice and tools.

One blog I found particularly interesting was one by Eric Hammond on runurl. So much so, in fact, that I've decided to showcase jclouds new EC2 support by using runurl to create and boot a lamp stack.

In order to perform the complete example, you'll need jclouds-scriptbuilder and jclouds-aws components. Here's a link to a maven pom that will set you up. You can also refer to our EC2 Quick Start wiki.

In order to boot up an lamp in EC2 using runurl, you must perform at least the following 4 things:
  1. create or locate a keypair
  2. create or locate a security group to allow http access
  3. create a runurl script
  4. run the instance
I'll take this step-by-step:

Connect and Create a KeyPair

Creating a keypair is a standard ec2 command to make ssh credentials. Here's how you do it in jclouds:


// connect and get a synchronous client
EC2Client client = EC2ContextFactory.createContext(accesskeyid,
secretkey).getApi();

// create the keypair
KeyPair keyPair = client.getKeyPairServices().createKeyPairInRegion(Region.DEFAULT, keyPairName);

Note that in jclouds every command is "region aware." In this way, you don't need to create different clients for each region you may have instances in.

Create and Authorize a SecurityGroup to HTTP

In order to connect to your instance, you must create a security group that allows traffic through. The following will punch a hole allowing everyone access to ports 22, 80, and 443:

client.getSecurityGroupServices().createSecurityGroupInRegion(Region.DEFAULT, name, securityGroupName);

for (int port : new int[] { 80, 443, 22 }) {
client.getSecurityGroupServices().authorizeSecurityGroupIngressInRegion(Region.DEFAULT,
name, IpProtocol.TCP, port, port, "0.0.0.0/0");
}

Create a runurl script

In order to install the lamp stack, I've chosen to use runurl. Here, we use a jclouds tool called scriptbuilder to generate a script that will execute when the instance starts:

String script = new ScriptBuilder() // lamp install script
.addStatement(exec("runurl run.alestic.com/apt/upgrade"))
.addStatement(exec("runurl run.alestic.com/install/lamp"))
.build(OsFamily.UNIX);

Run the instance

Running the instance is basically just choosing the correct options. The following will run the instance with the smallest possible size:

Reservation reservation = client.getInstanceServices().runInstancesInRegion(Region.DEFAULT,
null, // allow ec2 to chose an availability zone
"ami-ccf615a5", // alestic ami allows auto-invoke of user data scripts
1, // minimum instances
1, // maximum instances
asType(InstanceType.M1_SMALL) // smallest instance size
.withKeyName(keyPairName) // key I created above
.withSecurityGroup(securityGroupName) // group I created above
.withUserData(script.getBytes())); // script to run as root

RunningInstance instance = Iterables.getOnlyElement(reservation.getRunningInstances());

Finishing up

RunningInstance is a bad name, since the instance is still booting. There are many ways to block for an instance to be available. The full example shows you one way, but you can also just poke around in elasticfox or the like.

The most important info you need is the instanceid and the credentials to connect. Here's how to get that from jclouds.

System.out.printf("instance %s%n", instance.getId());
System.out.printf("login identity:%n%s%n", keyPair.getKeyMaterial());


I hope you enjoyed this quick how-to. Look forward to one on how to make an EBS-backed AMI soon.

Cheers,
-Adrian

Tuesday, December 22, 2009

preview: jclouds Terremark vCloud Express Support

We've been working for the last couple months on compute support in jclouds. The latest milestone we have is support for Terremark vCloud Express. Our api is 100% feature coverage, although very much a beta. That said, this time we have built tools and integration, including Dasein and ant support. Have a look at where we are at and let us know what you think!

http://code.google.com/p/jclouds/wiki/Terremark

Wednesday, November 25, 2009

jclouds ecosystem roundup 1

Today, I called together an impromptu chat on jclouds integration on irc (#jclouds on freenode). The goal was to have contributors and stakeholders discuss what they're working on, and generally get in touch with eachother. We were fairly well represented, 10 in the room by the end of it coming in from all over US, norway, romania, and new zealand. Outside geography, we were very diverse, with folks from jclouds, terracotta, jboss, rackspace, rimuhosting, enStratus/Dasein, and vCloud online. I'll sum up the topics we covered, for those of you who missed. Here it goes!

ShrinkWrap
Presented by aslak

ShrinkWrap is an open source, general purpose Archive API with a fluent'ish sound to it. It has a Extension model that lets you write specific Archive types like JavaArchive, WebArchive, etc. It is a part of the JBoss ecosystem and in use in the Embedsded AS and Arquillian projects.

Here's how it looks to use ShrinkWrap:

Archives.create("app.war", WebArchive.class).addPackage(recursive, Package.getPackage("org.jclouds")).as(ZipExporter.class).export(Stream)

One integration idea is to utilize ShrinkWrap to move or create configurations on cloud storage. Here's how that might look:

Archives.create("test",ZipImporter).import(Jclouds.OsStream).as(OSConfiguration.class).setMemory(1G).as(ZipExport.class).export(JClouds.OsStream)
jclouds is very excited about the ShrinkWrap story and the potential to use it to freeze and retarget cloud workloads. Thanks to Aslak for presenting!

enStratus and Dasein Cloud
Presented by nspollution

enStratus is a enterprise-focused cloud broker with a console to control multiple clouds. By enterprise, it ensures IT policies and procedures can be enforced in the cloud. These include key management, user management, data encryption, SLA management, backups, DR, BCP. It uses an open source project Dasein Clouds to facilitate multi-cloud abstraction. Dasein Clouds follows the JDBC model; It prescribes a series of interfaces for virtual hardware and network services. Then, implementations are created for different APIs.

Here's how it looks to find all servers in a specific region using Dasein:

cloudProvider.getServerServices(region).list();

George from Dasein is currently collaborating with jclouds to facilitate some of its abstractions, such as compute and storage. This is currently underway and has a goal of adding vCloud support to Dasein.

RimuHosting
Presented by ivan__

RimuHosting offers cloud servers, or VPSs, via a ReST api. They also offer dedicated servers. Dedicated servers can be traditional or configured with a Xen stack, under which case Rimu can manage your VPSs for you. One differentiator they have is scaling a single server. For example, someone can start on a 160mb vps, and then eventually let grow that to a 16-core dedicated server.

Ivan Meredith has been highly involved in the OSS cloud world, and jclouds is the 3rd project he's written support for: first libclouds, then deltacloud. Ivan believes jclouds is unique in allowing for a simple api, yet still allowing Rimu to wrap up features that are only used 1% of the time.

The Rackspace Cloud
Presented by greenisus

Rackspace is pretty popular and offers the cloud norms such as spin up instances, resize, reboot, etc. Their offering also includes CDN and storage management. The most echoed differentiator from the chatroom is the quality of the ReST apis they present. Rackspace apis were some of the first jclouds features and we foresee that strong relationship carrying forward.

Mike Mayo wrote the Rackspace Cloud iPhone application and will be soon building an android application for their cloud. He's a contributor to jclouds and an adviser on our android integration project.

jclouds-android

Presented by mihaicampean and bogdan_popa

Mihai has been taking interest in android since last year 1.0 release. In this time, he's created two android projects: aegis-shield and jclouds-android. jclouds-android is a proof of concept application. When complete, it will be a twitter client which runs on the jclouds framework.

Terracotta
Presented by jvoegele

Terracotta is java infrastructure software that extends the java memory model across a cluster. This allows for somewhat transparent clustering of java applications. Among other use cases, testing shows they have by far the fastest hibernate 2nd level cache to be found anywhere. They recently acquired ehcache and its primary maintainer greg luck, as well the quartz scheduler project. Under Terracotta, these projects will have tighter integration and cluster readiness.

Terracotta are working with jclouds to create tools that dynamically provision terracotta server and client nodes. We are also working with the cargo project on this. The goal is to make it easier for those who love Terracotta to manage it inside or outside of the cloud, by improving tools they already use.

vCloud Express
Presented by wattersjames

vmWare vCloud, unlike other clouds is more of a software based offering. vCloud Express is offered by various hosting companies, such as Terremark and Hosting.com. They see jClouds as unique because it not only is a translation layer for their API, but is also more directly programmable and integrated right into Java.

vmWare have been quite supportive of the jclouds project, from sponsoring time, to facilitating discussions with the ecosystem. James has been a great part of the jclouds story and has also helped promote collaborative projects as well.


Concluding Notes

We've invited the entire ecosystem, including those who couldn't make this irc session to participate in our new Compute Abstraction Design. Please also reach out to us, if you have an jclouds integration story, or would like help getting your jclouds story started.

Saturday, October 31, 2009

Save your tweets forever with jclouds!

I've created a new jclouds toy using features that will be released any day now in beta-3.

It is called jclouds-tweetstore. It pulls tweets about @jclouds and stuffs them into storage clouds.

It runs inside google appengine and is currently connected to three storage clouds: S3, Azure, and Rackspace.

While this is just a demo, it does prove you can integrate multiple clouds together... and save any feedback about you forever ;)

Go ahead and click

Here's how it works:

Every minute, a google appengine cron requests tweets referencing jclouds (yes.. via jclouds-twitter rest client).

allAboutMe = twitter.getMyMentions()

These tweets are stored as blobs into storage clouds via the jclouds blobstore api.

for (BlobMap map : maps) {
for (Status status : allAboutMe) {
map.put(status.getId() + "", new StatusToBlob(map).apply(status));
}
}

Whenever someone clicks the below above, jclouds pulls all the tweets directly from all configured clouds and displays them in a poorly formatted table.

Blob blob = map.get(id);
from = blob.getMetadata().getUserMetadata().get(TweetStoreConstants.SENDER_NAME);
tweet = Utils.toStringAndClose((InputStream) blob.getData());

This is all in the cloud, and almost free (there are light charges between google and the storage clouds).

I hope you enjoy and are excited about the nearly complete jclouds-beta-3!
-Adrian

Wednesday, October 14, 2009

ReST be nice.. no entities please!

When modeling a ReST endpoint, we often need to make an explicit decision how to address commands. A command is an action on a resource. We often need to provide options for how to operate this command. In many cases, options to commands are very simple. For example, response rendering, callback, limits, boolean values. It is commands like these that this post is relevant to.

In the spirit of harvesting as much as possible from the HTTP protocol, we try to make use of existing features to describe simple commands. Below, I'll show a few ways we can kill a robot slowly:
  • Slow Death by Matrix
    • POST http://localhost:8080/objects/robot/action/kill;death=slow HTTP/1.1
  • Slow Death by Headers
    • POST http://localhost:8080/objects/robot/action HTTP/1.1
      x-service-action: kill
      x-service-options: death=slow
  • Slow Death by Entity
    • POST http://localhost:8080/objects/robot/action HTTP/1.1
      Content-Length: 25

      {"kill":{"death":"slow"}}
I'll argue that death by Matrix or Entity are better choices, based on the fact that they are modeled in a way that the action and arguments to them are coupled visually. To narrow down farther between these choices, I'll introduce the normal chaos that exists in ReST resources.

Cloud chaos exists en mass, but for the sake of choosing your model, there are two major concerns, retries and redirects. For example, Amazon S3 sends responses back to you telling you to slow down. Lock-based systems will often send a code 409, which means there's a temporary conflict. In cases like these, the server has not processed your request, and you should replay it. In the case of redirects, the server will respond with information regarding the correct location of the resource. Again, your job is to replay the request.

Replaying requests is only straightforward when there is no entity or content in the request. When there is content, there is a chance that this is not replayable. For example, the Java type InputStream is not replayable, in such case it has to be recreated by the caller. Projects such as Apache HC note this as a normal concern and have guidance. Other projects leave the replayable issue up to the user.

It is based on the replayable issue that I highly prefer ReST (Cloud or otherwise) resources to expose commands in a way that requires no content. Matrix is one way to perform this, and totally supported by jclouds.

For example, here's how we model such a resource:

@POST
@Path("objects/{id}/action/{action}")
Future action(@PathParam("id") String id, @PathParam("action") String action,
@BinderParam(BindMapToMatrixParams.class) Map options);


I encourage you to engage in the the evolution of client-friendly modeling by commenting on this and other blogs like the Cloud Tools Manifesto.

jclouds tour