Where Does Your Node Program Spend Its Time?

Originally posted at Dave Pacheco’s Blog.

Performance analysis is one of the most difficult challenges in building production software. If a slow application isn’t spending much time on CPU, it could be waiting on filesystem (disk) I/O, network traffic, garbage collection, or many other things. We built the Cloud Analytics tool to help administrators and developers quickly identify these sources of latency in production software. When the app is actually spending its time on CPU, the next step is figuring out what it’s doing: what functions it’s executing and who’s calling them.

Read More »

A Bit More About the New Joyent Cloud API

As part of SmartDataCenter 6.5, we’ve introduced something I’m pretty excited
about sharing, which is an open signing protocol for HTTP. We’re using it in
what we call CloudAPI, which is the REST API that powers our customer portal, and is
exposed to you to create your own applications. In this post I’ll go over what the signing
protocol is and why you should care, how SmartDataCenter is using it, and some
ideas we have in mind for the future.

If you’re not an RFC nerd, HTTP defines a “pluggable” framework for
authentication so that any number of authentication mechanisms can be used, and
goes on to define Basic Authentication. Basic Authentication is nothing more
than sending your username and password across the wire, so it’s not ideal from
a security perspective. Following Basic Authentication, HTTP also goes on to define
Digest Authentication, which involves sending a hashed form of your password
across the wire, but it’s hard to use and doesn’t see much use these days.

However, for the best security properties, you really want to be leveraging
public key cryptography to authenticate. HTTPS (well, really SSL) has a mode
of allowing this called “client-auth”; basically your client has a certificate
and private key, and in addition to validating the server’s certificate, the
server asks you (the client) to present your certificate. Sounds great, and
does do what we want from a security perspective. However, it’s plagued by poor
usability in browsers, is more or less a programming nightmare (especially with
“custom” CAs), and can incur significant server-side cost, since it’s usually
a load balancer (i.e., HTTPS termination point) that is handling the
cryptography validation. All of this means that we want the benefits of public key crypto
without the burden of client-auth SSL.

To solve this problem with SmartDataCenter, we’re introducing the HTTP Signature
authentication scheme, which is already released as open source. The HTTP
Signature scheme allows for digital signatures to be leveraged as your
credentials, and does not imply any particular key management scheme. It should
be suitable for just about any REST API out there, so we’re certainly hoping if
you’re an API vendor you take a look at it, but rather than jump into the
technical details of the spec itself, I thought it would be interesting to
show you some CLI candy, so lets walk through some examples.

Read More »

Node on Azure and SDK Release

Earlier this week you may have seen word from Corporate Vice President Scott Guthrie or the Interoperability Team at Microsoft, but today the Azure team released their announcement regarding Node on Windows Azure and the Windows Azure SDK for Node.js. After many months of hard work between the Azure team and the core team here at Joyent on the port of Node and npm, we’re seeing Node’s first appearance in Microsoft’s products. For Joyent and the Node community, this is yet another strong statement regarding the maturity and enterprise-readiness that Node has achieved. Providing a first-class experience for Node developers on Windows is like strapping space shuttle solid rocket boosters to a technology that was already shattering airspeed records for community growth and enthusiasm. You will have the opportunity to hear more about Node from Microsoft’s perspective from Scott Guthrie himself in late January at the Node Summit.

Ryan Dahl, Engineer at Joyent and creator of Node, also chimes in on the importance of this achievement.

Scaling WordPress on Joyent Cloud: Part Three

The following is a continuation of our series on scaling WordPress and a repost of Peter Yorke’s original at peteryorke.net. Peter is a Solution Architect at Joyent with a passion for performance and scalability — he’s the man who knows how to make websites go to “11.”

You have hit the big time with your WordPress blog, GigaOM and Nikki Finke think you are the most influential blogger on the planet.

Congratulations, your mysql database is about ready to crash or worse, it already has.

In this post I discuss what you need to do with your database to keep this beast running smoothly and your blogging empire humming along.

Besides your users reporting the blog is slow, what objective measures are there? I use the NewRelic Std feature that lets me see database latency. Here an example of a database that needs help.

Read More »

Presenting File System Latency

Part 5 of 5 on Examining File System Latency in Production, by Brendan Gregg, Lead Performance Engineer at Joyent and author with Jim Mauro of “DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD

Previously I have explained why disk I/O metrics may not reflect application performance, and how some file system issues may be invisible at the disk I/O level. I then showed how to resolve this by measuring file system latency at the application level using MySQL as an example, and measured latency from other levels of the operating system stack to pinpoint the origin.

The main tool I’ve used so far is DTrace, which is great for prototyping new metrics in production. In this post, I’ll show what this can mean beyond DTrace, establishing file system latency as a primary metric for system administrators and application developers. I’ll start by discussing the history for this metric on the Solaris operating system, how it’s being used at the moment, and what’s coming next. The image below provides a hint:

Read More »

A 2000x Performance Win with DTrace Analytics

Note: This post originally appeared on the DTrace blog of @brendangregg

I recently helped analyze a performance issue in an unexpected but common place, where the fix improved performance of a task by around 2000x (two thousand times faster). As this is short, interesting and useful, I’ve reproduced it here in a lab environment to share details and screenshots.

Issue

In a production SmartOS cloud environment, a script is used to count entries in an active log. This script is executed from cron(1M), and the grep(1) command it uses is running very slowly indeed:

# time grep done log | wc -l
1492751

real    8m56.062s
user    8m12.275s
sys     0m0.218s
# ls -l log
-rw-r--r-- 1 root root 18M 2011-12-08 21:59 log

That took almost nine minutes. The grep(1) command is processing this file at about 34 Kbytes/sec, which seems awfully slow. Dropping the “wc -l” and using “grep -c” instead didn’t make much difference.

Hypothesis

I have a method for attacking performance issues (which I’ll explain in a later article); it starts with checking for errors, because they do cause performance issues and are a fast area to check.

The slow time sounds like it could be disk I/O related: either slow reads caused by disk errors, or random disk reads. I first checked the error counts:

# iostat -En | grep Hard
sd0              Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
sd1              Soft Errors: 0 Hard Errors: 0 Transport Errors: 0

Which were fine, and then for random reads:

# iostat -xnz 1
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.2    0.3    1.3    1.0  0.0  0.0    0.0    0.0   0   0 ramdisk1
   12.3   81.6  355.4 3005.9  0.0  0.1    0.0    0.7   0   2 sd1
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    7.0    0.0  245.7  0.0  0.0    0.0    0.1   0   0 sd1
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    0.0    4.0    0.0   59.5  0.0  0.0    0.0    0.1   0   0 sd1
[...]

Which are not present. This doesn’t look disk related.

Read More »

A Brand New MongoDB Appliance with Pre-Installed, Tuned Up Mongo Goodness

We’re down in Santa Clara for the annual MongoSV show and simultaneously announcing the launch of our brand new SmartMachine for MongoDB. This new virtual machine comes pre-loaded with a fully tuned, optimized, and ready to deploy instance of MongoDB, the NoSQL database that is becoming a key player in mobile, e-commerce, game and other responsive, low-latency applications. The growth of the MongoDB community has been extremely rapid as more people have come to embrace NoSQL data stores for modern Web application and DIRTY (data-intensive, real-time) uses.

In discussions we also heard many devs talk about how MongoDB and Node.js (the application environment of which Joyent is a corporate steward) play so well together with javascript, JSON, and other nice matches that make mating the two ridiculously easy to pull off. MongoDB running on top of Joyent’s SmartOS SmartMachine brings a lot of additional goodies including:

  • On-the-fly resizing of the database with no reboot or service interruption
  • Bursting of up to 800% at not extra charge to keep the database responding smoothly, even under extreme loads
  • DTrace-driven analytics for mapping latency outliers in four-dimensional heat maps rather than fighting through logs or staring at graphs or charts that hide the bad actors

You can get your feet wet with MongoDB+Node.js with this simple blog by our friends at MongoLab explaining how to set up a basic Node.s Web server running on a MongoLabs database as a service. Also we have some details of putting both MongoDB and Node.js on a Joyent SmartMachine virtual server.  MongoHQ, like MongoLab, is another viable PaaS that allows you to play around with MongoDB in a simple to manage software-as-a-service environment.

Most importantly, we’re interested in what the Joyent Cloud community is going to do with this new SmartMachine purpose-built for MongoDB apps. Please share and let us know what you think and what improvements can be made. Thanks!

Circonus: Copper Plan Free for 90 Days to Joyent Cloud Clients and the Rise of DevOps

We’ve known Theo Schlosnsnagle (@postwait)for a quite a while and we respect him greatly. Which is one of the reasons why we have supported a partnership with Circonus, the company that he founded and runs. But also we have a shared vision that some people call DevOps. That is, the line between software development and operations is blurring and so are the tools used to accomplish both tasks. (In fact, our Director of Systems Engineering, Ben Rockwood, just gave a talk about this as the keynote at the Usenix LISA 2011 conference in Boston this week and he’s an interview with him from the Usenix blog on this topic). The point being, for organizations to run applications that have the highest performance and reliability, its critically important that developers understand the implication of what they do in operational terms, and operations teams communicate what they are seeing with dev teams to help them build better software products. This contributes to a positive feedback loop of continual iteration and improvement. Ergo, DevOps.

 

And this is where Circonus comes in as a Joyent partner. Circonus provides organization-wide unified monitoring across a wide variety of critical parameters spanning an organization’s entire infrastructure constellation. In a nutshell, they make it much easier for ops teams to spot problems and monitor performance (They also work in partnership with another key Joyent Cloud partner, application and server monitoring service New Relic and the two services are quite complementary).

It’s all part of the growing DevOps ecosystem that we at Joyent see as a key piece of the future for cloud computing. (It also ties in with our own Cloud Analytics product for highly granular root cause analysis and application tuning- another complementary offering). So you’ll be hearing a lot about this from us going forward as the concept of DevOps continues to evolve and tools around it continue to evolve with it. Circonus is currently offering a special incentive to Joyent Cloud clients — the Circonus Copper plan free for a full 90 days. Joyent Cloud clients can take advantage of this promotion here – http://circonus.com/partners/joyent/sign-up. It’s a great deal and we are always open to any feedback you may have about our partner services. Thanks for reading!

Drilling Down Into the Kernel

Part 4 of 5 on Examining File System Latency in Production, by Brendan Gregg, Lead Performance Engineer at Joyent and author with Jim Mauro of “DTrace: Dynamic Tracing in Oracle Solaris, Mac OS X and FreeBSD

Previously I showed how to trace file system latency from within MySQL using the pid provider. Here I’ll show how similar data can be retrieved using the DTrace syscall and fbt providers. These allow us to trace at the system call layer, and deeper in the kernel at both the Virtual File System (VFS) interface and within the specific file system itself.

Syscall Tracing

From the system call layer, the file system can be traced system-wide, examining all applications simultaneously (no “-p PID”), using DTrace’s “syscall” provider:

Read More »

eBay chooses Node.js as the runtime stack in ql.io: a data-retrieval and aggregation gateway for HTTP APIs

Today eBay announced their release of ql.io, a declarative, evented, data-retrieval and aggregation gateway for HTTP APIs.

Subbu Allamaraju, an Architect at eBay, explains, “Through ql.io, we want to help application developers increase engineering clock speed and improve end user experience. ql.io can reduce the number of lines of code required to call multiple HTTP APIs while simultaneously bringing down network latency and bandwidth usage in certain use cases.”

Subbu goes on to provide many impressive early ql.io benchmarks along with some highlights from their experience in using JavaScript and Node.js as their language and runtime stack. These are impressive testimonials on expediting development cycles while boosting connection scale and speed at massive rates. This also provides valuable insight into the rapidly emerging Enterprise decision to choose Node for large-scale production applications.

Follow

Get every new post delivered to your Inbox.

Join 38 other followers