Gary Allison's Leadership Blog

Cloud Computing

Cloud Computing11 Dec 2017 05:54 am
Time to dedicate to this blog has been little and far between, but I resolve to make it more frequent this coming year. Just completed my second AWS certification and thought I’d write a few notes here to hopefully help others.  The blueprint I have found successful:
  1. First, I would heavily recommend training courses as prep for the exam. It is a great foundation to start from. The courses are exceptionally well done, very affordable, and give great exam tips.  These were recommended to me by at the 2016 reinvent and it was a great suggestion. Just do it.
  2. There’s no substitute for experience.  I’m convinced if not for my years of experience with TCP/IP, software development, network configuration, etc, I would not have passed the exams.  While I don’t have deep experience developing on AWS, my background with AWS, years of experience, plus the prep was a winning combination.
  3. Read the FAQs. Yes, they are more than 120 chars long.  They may take you quite a while to get through, but it is very helpful for the exam.
  4. Google for AWS Solution Architect Associate Exam Tips or AWS Developer Associate Exam Tips.  That’s probably how you found this post so you’re ahead of the game.  Read the tips people give.
  5. If you are working full time, and have a family, allow 3-4 months per test.  Yeah, it takes that long, but hey you’re probably smarter than me.
  6. The test is at least a year behind the AWS services. AWS is undergoing such rapid innovation that the certification exams have a hard time keeping up.  Here, and google searches really pay off as they will tend to keep you at the current state of the certification exams, vs the current state of AWS.

Experience with the AWS Solutions Architect Associate exam:

Took this exam in Jan 2017. The largest topics on this exam were VPCs, S3, and EC2, with a bit of Route 53 thrown in for good measure.  Take the course.  You need to read the S3 FAQ (yes the minimum file size is 0 bytes), and understand VPCs, including NAT instances.  Know all your HTTP return codes.  Know SQS vs SNS. Know Alias records, A records, and the rest of Route 53. I recall mutiple security questions – read the well architected framework and AWS security best practices in addition to the Shared Responsibility model document.  All of this is good practical knowledge in addition to being essential to scoring well on the exam.

I took this exam first and felt it was pretty difficult.  I scored in the 80s and was happy to achieve the score.  Some of the questions are straight single answer multiple choice, but many are pick 2 and even some pick 3 answer variations.  These are not fun.

Experience with the AWS Certified Developer Associate exam

Took this exam at reinvent 2017 (November 2017). Having completed the Solutions Architect exam made this exam easier.  As you’d guess, it is more developer centric, but also has a fair amount of overlap with the Solutions Architect. The overlap areas are VPC, S3, Route53 and a bit of EC2 (I think we all know what happens to instance based storage with the instance terminates, heh?). SQS and SNS were also represented by about 5 or so questions on the Developer Associate exam.  I think there was 1 question on SWF.

Where the exams differ is the Developer exam is heavy on DynamoDB – at least 3 questions on calculating different kinds of read and write throughput. Also, there are specific questions on API cals. Read the S3 API, DynamoDB API, and understand how Federated authentication works (AssumeRoleWithWebIdentity, AssumeRoleWithSAML).

There were one or two questions featuring API syntax, but if you are firmly rooted in the principles, you can pick the right answer without memorizing the syntax.  CloudFormation, ElasticBeanstalk and the SDKs were covered at a high-level with a few questions (certainly know what languages are supported for each).

While I won’t give away the specific questions on the exam, there is one that I found so humorous that must be shared. The question had to do with legitimate endpoints for SNS and one of the choices was Named Pipes.  Now that was a blast from the past!  I almost laughed out loud in the exam.  Whomever wrote that question, I salute you.

Know your limits for both Certification Exams

Found it handy for studying to create a table of minimums and maximums for various services in AWS.  This is accurate as of Nov 2017, but beware that these can and do change.  But then again, you have a year before they update the exam.  Hope it is helpful.  Good luck!

S3 object size
Single PUT limit is 5G, but should use multi-part upload for anything larger than 100Mb
S3 buckets
Call AWS to raise limit
S3 Availability
99.9 for IA
99.99 for Standard
1 byte
DynamoDB block size
1K writes
4K reads
Eventual consistent Reads are 2/sec, Strongly consistent Reads are 1/sec, All Writes are 1/sec
DynamoDB BatchWriteItem
25 items, up to 16MB
DynamoDB BatchGetItem
100 items, up to 16MB
DynamoDB Query
1Mb max returned
DynamoDB Global Sec Index
5 max
partition key can be on any attribute
SQS Default Visibility Timeout
30 sec
12 hours
Extend the timeout by calling ChangeMessageVisibility
SQS Message Delay
15 mins
SQS Message Size
Billed in 64K chunks
SQS Requests
1 message
10 messages
Up to 256K
SQS retention
14 days
SQS Long Polling
20 sec
Maximum long polling time out is 20 seconds for SQS
SNS Topics
256 chars
SWF Retention
1 year
Cloud Computing and Leadership and Teams25 Jun 2016 03:27 pm

{This is the final post in a 3 part series intended to tell the story of how we have been able to achieve an epic rearchitecture of our core platform. Special thanks to all of those who have helped in review and editing the original white paper.}

Rearchitecting the Team

In addition to rearchitecting the service to scale, we also had to rearchitect our team. As we set out on this journey to rebuild our solution into a scalable, cloud based service oriented architecture, we had to reconsider the very way our teams are put together.  We reimagined our team structure to include all the ingredients the team needs to go fast.  This meant a big investment in devops – engineers that focus on new architectures, deployment, monitoring, scalability, and performance in the cloud.  

A critical part of this was a cultural transformation where the service is completely owned by the team, from understanding the requirements, to code, to automated test, to deployment, to 24×7 operation.  This means building out a complete monitoring and alerting infrastructure and that the on-call duty rotated through all members of the team.  The result is the team becomes 100% aligned around the success of the service and there is no wall to throw anything over – the commitment and ownership stays with the team.

For this team architecture to succeed, the critical element is to ensure the team has all the skills and team players needed to succeed. This means platform services to support the team, strong product and program managers, talented QA automation engineers that can build on a common automation platform, gifted technical writers, and of course highly talented developers. These teams are built to learn fast, build fast, and deploy fast, completely independent of other teams.

Supporting the service-oriented teams, a key element is our Platform Infrastructure team we created to provide a common set of cloud services to support all our teams. Platform Infrastructure is responsible for the virtual private cloud (VPC) supporting the new services running in amazon web services. This team handles the overall concerns of security, network, service discovery, and other common services within the VPC. They also set up a set of best practices, such as ensuring all cloud instances are tagged with the name of the team that started them.

To ensure the best practices are followed, the platform infrastructure team created Beavers (a play on words describing a engineer at Bazaarvoice, a BVer). An idea borrowed from Netflix’s chaos monkeys, these are automated processes that run and examine our cloud environment in real time to ensure the best practices are followed. For example, the Conformity Beaver runs regularly and checks to make sure all instances and buckets are tagged with team names. If it finds one that is not, it infers the owner and emails team aliases of the problem.  If not corrected, Conformity Beaver can terminate the instance.  This is just one example of the many Beavers we have created to help maintain consistency in a world where we have turned teams lose to move as quickly as possible.

An additional key common capability created by the Platform Infrastructure team is our Badger monitoring services. Badger enables teams to easily plug in a common healthcheck monitoring capability and can automatically discover nodes as they are started in the cloud. This service enables teams to easily implement these healthcheck that is captured in a common place and escalated through a notification system in the event of a service degradation. 

The Proof is in the Pudding

The Black Friday and Holiday shopping season of 2015 was one of the smoothest ever in the history of Bazaarvoice while serving record traffic. From Black Friday to Cyber Monday, we saw over 300 million visitors.  At peak on Black Friday, we were seeing over 97,000 requests per second as we served up over 2.6 billion review impressions, a 20% increase over the year before. Â There have been years of hard work and innovation that preceded this success and it is a testimony to what our new architecture is capable of delivering.

Keys to success

A few ingredients we’ve found to be important to successfully pull off a large scale rearchitecture such as described here:

  • Brilliant people. There is no replacement for brilliant engineers who are fearless in adopting new technologies and tackling what some will say can’t be done.
  • Strong leaders – and the right leaders at the right time. Often the leaders that sell the vision and get an undertaking like this going will need to be supplemented with those that can finish strong.
  • Perseverance and Determination – building a new platform using new technologies is going to be a much bigger challenge than you can estimate, requiring new skills, new approaches, and lots of mistakes. You must be completely determined and focused on the end game.
  • Tie back to business benefit – keep business informed of the benefits and ensuring that those benefits can be delivered continuously rather than a big bang. It will be a large investment and it is important that the business see some level of return as quickly as possible.
  • Make space for innovation – create room for engineers to learn and grow. We support this through organizing hackathons and time for growth projects that benefit the individual, team, and company.

Reachitecture is a Journey

One piece of advice: don’t be too critical of yourself along the way; celebrate each step of the reachitecture journey. As software engineers, we are driven to see things “complete”, wrapped up nice and neat, finished with a pretty bow. When replacing an existing system of significant complexity, this ideal is a trap because in reality you will never be complete. It has taken us over 3 years of hard work to reach this point, and there are more things we are in the process of moving to newer architectures. Once we complete the things in front of us now, there will be more steps to take since we live in an ever evolving landscape. It is important to remember that we can never truly be complete as there will be new technologies, new architectures that deliver more capabilities to your customers, faster, and at a lower cost. Its a journey.

Perhaps that is the reason many companies can never seem to get started. They understandably want to know “When will it be done?” “What is it going to cost?”, and the unpopular answers are of course, never and more than you could imagine.  The solution to this puzzle is to identify and articulate the business value to be delivered as a step in the larger design of a software platform transformation. Trouble is of course, you may only realistically be able to design the first few steps of your platform rearchitecture, leaving a lot of technical uncertainty ahead. Get comfortable with it and embrace it as a journey. Engineer solid solutions in a service oriented way with clear interfaces and your customers will be happy never knowing they were switched to the next generation of your service.

Cloud Computing and Leadership and Teams18 Jun 2016 05:51 am

{This is the second in a 3 part series of posts intended to tell an impressive story of how we have been able to achieve an epic rearchitecture of our core platform. Special thanks to all of those who have helped in review and editing the original white paper.}

The Journey Begins

One of the first things we decided to tackle was to start moving analytics and reporting off the existing platform so that we could deliver new insights to our clients showing how reviews are used by shoppers in their purchase decisions.  This choice also enabled us to decouple the architecture and spin up parallel teams to speed delivery. To deliver these capabilities, we adopted big data architectures based on Hadoop and HBase to be able to assimilate hundreds of millions of web visits into analytics that would paint the full shopper journey picture for our clients.  By running map reduce over the large set of review traffic and purchase data, we are able to give our clients insight into these shopper behaviors and help our clients better understand the return on investment they receive from consumer generated content.  As we built out this big data architecture, we also saw the opportunity to offload reporting from the review display engine.  Now, all our new reporting and insight efforts are built off this data and we are actively working to move existing reporting functionality to this big data architecture.

On the front end, flexibility and mobile was a huge driver in our rearchitecture.  Our original template-driven, server-side rendering can provide flexibility, but that ultimate flexibility is only required in a small number of use cases.  For the vast majority, a client-side rendering via javascript with behavior that can be configured through a simple UI would yield a better mobile-enabled shopping experience that’s easier for clients to control.  We made the call early on not to try to force migration of clients from one front end technology to another.  For one thing, it’s not practical for a first version of a product to be 100% feature function capable to the predecessor.  For another, there was just simply no reason to make clients choose.  Instead, as clients redesigned their sites and as new clients were onboard, they opt’ed in to the new front end technology.

We attracted some of the top javascript talent in the country to this ambitious undertaking. There are some very interesting details of the architecture we built that have been described on our developer blog and that are available as open source projects on in our bazaarvoice github organization. Look for the post describing our Scoutfile architecture in March of 2015. The BV team is committed to giving back to the Open Source community and we hope this innovation helps you in your rearchitecure journey.

On the backend, we took inspiration from both Google and Netflix. It was clear that we needed to build an elastic, scalable, reliable, cloud-based data store and query layer. We needed to reorganize our engineering team into autonomous service oriented teams that could move faster. We needed to hire and build new skills in new technologies.  We needed to be able to roll this out as transparently as possible to our clients while serving live shopping traffic so no one knows its happening at all.  Needless to say, we had our work cut out for us.

For the foundation of our new architecture, we chose Cassandra, an Open Source NoSQL data solution based on influence of ideas from Google and their BigTable architecture.  Cassandra had been battle hardened at Netflix and was a great solution for a cloud resilient, reliable storage engine. On this foundation we built a service we call Emo, originally intended for sentiment analysis.  As we made progress towards delivery, we began to understand the full potential of Cassandra and its NoSQL based architecture as our primary display storage.

With Emo, we have solved the potential data consistency issues of Cassandra and guarantee ACID database operations. We can also seamlessly replicate and coordinate a consistent view of all the rating and review data across AWS availability zones worldwide, providing a scalable and resilient way to serve billions of shoppers.  We can also be selective in the data that replicates for example from the European Union (EU) so that we can provide assurances of privacy for EU based clients. In addition to this consistency capability, Emo provides a databus that allows any Bazaarvoice service to listen for the kinds of changes the service particularly needs, perfect for a new service oriented architecture. For example, a service can listen for the event of a review passing moderation which would mean that it should now be visible to shoppers.

While Emo/Cassandra gave us many advantages, its NoSQL query capability is limited to what Cassandra’s key-value paradigm. We learned from our experience with Solr that having a flexible, scalable query layer on top of the master datastore resulted in significant performance advantages for calculating on-demand results of what to display during a shopper visit. This query layer naturally had to provide the distributed advantages to match Emo/Cassandra. We chose ElasticSearch for our architecture and implemented a flexible rules engine we call Polloi to abstract the indexing and aggregation complexities away from engineers on teams that would use this service.  Polloi hooks up to the Emo databus and provides near real time visibility to changes flowing into Cassandra.

The rest of the monolithic code base was reimplemented into services as part of our service oriented architecture. Since your code is a direct reflection of the team, as we took on this challenge we formed autonomous teams that owned everything full cycle from initial conception to operation in production. We built the teams with all the skills needed for success: product owners, developers, QA engineers, UX designers (for front end), DevOps engineers, and tech writers. We built services that managed the product catalog, UI Configuration, syndication edges, content moderation, review feeds, and many more.  We have many of these rearchitected services now in production and serving live traffic. Some examples include services that perform the real time calculation of what Brands are syndicating consumer generated content to which Retailers, services that process client product catalog feeds for 100s of millions of products, new API services, and much more.

To make all of the above more interesting, we also created this service-oriented architecture to leverage the full power of Amazon’s AWS cloud. It was clear we had the uncommon opportunity to build the platform from the ground up to run in the cloud with monitoring, elastic resiliency, and security capabilities that were unavailable in previous data center environments.  With AWS, we can take advantage of new hardware platforms with a push of a button, create multi datacenter failover capabilities, and use new capabilities like elastic MapReduce to deliver big data analytics to our clients.  We build auto-scaling groups that allow our services to automatically add compute capacity as client traffic demands grow. We can do all of this with a highly skilled team that focuses on delivering customer value instead of hardware procurement, configuration, deployment, and maintenance.   

So now after two plus years of hard work, we have a modern, scalable service-oriented solution that can mirror exactly the original monolithic service. But more importantly, we have a production hardened new platform that we will scale horizontally for the next 10 years of growth.  We can now deliver new services much more quickly leveraging the platform investment that we have made and deliver customer value at scale faster than ever before.

So how did we actually move 300 million shoppers without them even knowing?

Divide and Conquer

As Engineers, we often like nice clean solutions that don’t carry along what we like to call technical debt.  Technical debt literally is stuff that we have to go back to fix/rewrite later or that requires significant ongoing maintenance effort.  In a perfect world, we fire up the the new platform and move all the traffic over.  If you find that perfect world, please send an Uber for me. Add to this the scale of traffic we serve at Bazaarvoice, and it’s obvious it would take time to harden the new system.

The secret to how we pulled this off lies in the architecture choices to break apart the challenge into two parts: frontend and backend.  While we reengineered the front-end into the the new javascript solution, there were still thousands of customers using the template-based front end.  So, we took the original server side rendering code and turned it into a service talking to our new Polloi service.  This enabled us to handle request from client sites exactly like the Classic original system.

Also, we created a service improved upon the original API but was compatible from a specific version forward.  We chose to not try to be compatible for all version for all time, as all APIs go through evolution and deprecation.  We naturally chose the version that was compatible with the new Javascript front end.  With these choices made, we could independently decide when and how to move clients to the new backend architecture irrespective of the front-end service they were using.

A simplified view of this architecture looks like this:

New Architecture

Simplified view of New Architecture


With the above in place, we can switch a Javascript client to use the new version of the API through just changing the endpoint of the API key.  For a template-based client, we can change the endpoint to the new referring service through a configuration in our CDN Akamai.

Testing for compatibility is a lot of work, though not particularly difficult. API compatibility is pretty straight forward, which testing whether a template page renders correctly is a little more involved especially since those pages can be highly customized.  We found the most effective way to accomplish the later since it was a one time event was with manual inspection to be sure that the pages rendered exactly the same on our QA clusters as they did in the production classic system.

Success we found early on was based on moving cohorts of customers together to the new system. At first we would move a few at a time, making absolutely sure the pages rendered correctly, monitoring system performance, and looking for any anomalies.  If we saw a problem, we could move them back quickly through reversing the change in Akamai. At first much of this was also manual, so in parallel, we had to build up tooling to handle the switching of customers, which even included working with Akamai to enhance their API so we could automate changes in the CDN.

From moving a few clients at a time, we progressed to moving over 10s of clients at a time. Through a tremendous engineering effort, in parallel we improved the scalability of our ElasticSearch clusters and other systems which allowed us to move 100s of clients at a time, then 500 clients at time. As of this writing, we’ve moved over 5,000 sites and 100% of our display traffic is now being served from our new architecture.  
More than just serving the same traffic as before, we have been able to move over display traffic for new services like our Curations product that takes in and processes millions of tweets, Instagram posts, and other social media feeds.  That our new architecture could handle without change this additional, large-scale use case is a testimony to innovative engineering and determination by our team over the last 2+ years. Our largest future opportunities are enabled because we’ve successfully been able to realize this architectural transformation.

{more to come in part 3}

Cloud Computing and Leadership and Teams11 Jun 2016 05:29 am

{This is the first in a 3 part series of posts intended to tell an impressive story of how we have been able to achieve an epic rearchitecture of our core platform. Special thanks to all of those who have helped in review and editing the original white paper.}

At Bazaarvoice, we’ve pulled off an incredible feat, one that is such an enormous task that I’ve seen other companies hesitate to take on.  We’ve learned a lot along the way and I wanted to share some of these experiences and lessons in hopes they may benefit others facing similar decisions.

The Beginning

Our original Product Ratings and Review service served us well for many years, though eventually encountered severe scalability challenges. Several aspects we wanted to change: a monolithic Java code base, fragile custom deployment, and server-side rendering. Creative use of tenant partitioning, data sharding and horizontal read scaling of our MySQL/Solr based architecture allowed us to scale well beyond our initial expectations. We’ve documented how we have accomplished this scaling on our developer blog in several past posts if you’d like to understand more.  Still, time marches on and our clients have grown significantly in number and content over the years.  New use cases have come along since the original design: emphasis on the mobile user and responsive design, accessibility, the emphasis on a growing network of consumer generated content flowing between brands and retailers, and the taking on of new social content that can come in floods from Twitter, Instagram, Facebook, etc.

As you can imagine, since the product ratings and reviews in our system are displayed on thousands of retailer and brand websites around the world, the read traffic from review display far outweighs the write traffic from new reviews being created.  So, the addition of clusters of Solr servers that are highly optimized for fast queries was a great scalability addition to our solution.

A highly simplified diagram of our classic architecture:

Simplified View

Simplified View of Classic Architecture

However, in addition to fast review display when a consumer visited a product page, another challenge started emerging out of our growing network of clients.  This network is comprised of Brands like Adidas and Samsung who collect reviews on their websites from consumers who purchased the product and then want to “syndicate” those reviews to a set of retailer ecommerce sites where shoppers can benefit from them. Aside from the challenges of product matching which are very interesting, under the MySQL architecture this could mean the reviews could be copied over and over throughout this network.  This approach worked for several years, but it was clear we needed a plan for the future.

As we grew, so did the challenge of an expanding volume of data in the master databases to serve across an expanding network of clients.  This, together with the need to deliver more front-end web capability to our customers, drove us to what I hope you will find is a fascinating story of rearchitecture.

Cloud Computing and Leadership and Teams15 Apr 2016 07:58 am

It has been an absolutely crazy year since my last post. Work and family have both had me consumed, and while that hasn’t fundamentally changed, there is a story that must be told and I’ve carved out the time to write a rather long narrative that I’ll break up into three parts for our Bazaarvoice developers blog and will cross post here.

The story is one of an epic replatform that has reached an critical milestone completion.  It is a story of innovation and determination by brilliant people who I’ve had the honor to lead through a good portion of the effort. It is also a story of leadership and organizational change.  Truly, it deserves its own book.  Hmmm.

This is a brief teaser of what is to come.  I am so proud of the team at BV and what we have accomplished.  Can’t wait to share it with you.

Agile Software and Cloud Computing and Effective Software Projects and Tech News24 May 2014 07:15 am

Gordon Moore

Monday, an article was posted on Forbes titled “Why Software Doesn’t Follow Moore’s Law”. This topic has been on my mind some lately as we’ve been working to rollout a new platform with greater scalability, lower maintenance costs, and significantly enhanced capabilities. For engineers that have worked only on the new platform, it is easy to disparage the original platform. Many of them were were not even professional engineers when the original platform was written and they lose sight of how software architectures have changed over the years from the traditional three tier architecture to the lambda architectures of today.

It’s interesting to break this down a bit more. Are engineers smarter today than they were 8-10 years ago? And were those of 10 years ago smarter than those of 20 years past? In my humble estimation no, and one could argue todays graduates know appreciably less about how computers actually work but lets not digress.

What has changed is the way software is constructed. One can argue that modern languages have a major role to play in this, but one can argue also they are an outcome of the changes in architecture. But why has the architectural change occurred?

I believe a significant part of the change in software architecture is attributable to Moore’s law, somewhat in opposition to Mr Maccaba’s article in Forbes this week. Certainly engineers 10 and perhaps 20 years ago understood the advantages of isolation and service oriented architectures. Why didn’t they build systems in this way?

A major reason what that the systems and networks in place really couldn’t support it. Without gigabit networks, fast processors, and super cheap compute, the emphasis was on highly optimized code paths. REST Call? You would have been laughed out of town. Now the opposite is true.

Thus, Moores law has enabled much less efficient architectures to be feasible. Mr Maccaba also asserts this, but missed the point. These new architectures are much more scalable, can be built with fewer dependencies, can be changed independently since they are loosely coupled, and allow previously unobtainable compute goals to be broken down into independent parts and delivered. This is true whether we are talking about large scale web application infrastructure or big data analytics.

By constructing software architectures that take advantage of Moore’s law, we are solving problems that could never be solved before and constructing software systems of higher quality that we can actually deliver in much faster times to market. While certainly not doubling every 18 months, the time to market of new highly scalable solutions is measured in months today instead of years.

At the end of the day, I feel the point Mr Maccaba has missed here is that Moore’s law doesn’t apply to humans. Thus, we leverage Moore’s law to make humans more effective in software engineering through architectures that do deliver significant increases in scalability, quality, and capability.

Cloud Computing and Effective Software Projects and Leadership and Teams27 Nov 2011 10:28 pm

Last week, one of my team members forwarded me a link to this blog by Savio Rodrigues, entitled Why devops is no silver bullet for developers.  It’s a well written blog and Savio makes some good points, namely that environments that the Devops team hopes to build on need to be standardized. He comes so close to hitting some important topics right on the head, and then just misses the mark slightly, IMHO.

Savio nails it when he points out

“One thing I’ve come to understand is that these two groups tend to think differently, even if they are using the same words and nodding in agreement.”

Bingo Savio.  He goes on to say,

“It’s no surprise developers want to adopt tools and processes that allow them to become more efficient in delivering new applications and continuous updates to existing applications. Today, these two tasks are hindered to a degree by the operations teams that are responsible for production environments”

But then, he misses an opportunity to drive the point home and starts a discussion about standards. I agree standards are important, but what needs to be reckoned with are the very different culture, goals, and reward systems between the two disciplines of Engineering/Development and IT/Operations.

How are these teams measured and rewarded? The answers to these questions tell you many things about the team’s culture. A Development team is typically measured and rewarded by amount of innovation, quality of their deliverables, timeliness of delivery, and responsiveness to market.  An IT team is measured and rewarded typically by uptime, stability, security, and control.  (Note rewarded can mean “not punished due to failure” as well as more expected definitions of reward).

All of the above seem like good things! We want uptime, innovation, quality, stability, etc!  Right? I envision one could draw a Venn diagram for the Dev culture and the IT culture and there would be overlap, but there would be just as much outside the intersection.  Innovation is often at odds with stability.  Responsiveness to market can be at odds with uptime, etc.

We’ve had the good fortune of having a few opportunities to implement a new Devops model.  When everyone is rowing together the boat certainly moves faster in the desired direction. But it is difficult. It requires continual investment in the Devops team because at the core, these two very different cultures aren’t going away anytime soon.  Savio sees it too when he says, “This isn’t a technical issue. It’s a cultural issue.” I’d suggest we spend as much time looking at the measurements and rewards as we do thinking about standardizing platforms.

Cloud Computing and Everyday Tech27 Nov 2011 08:38 am

Recently, a tragedy hit one of the members on my team where he lost his home.  Our team has rallied around him and his family and have done what we can to help – that’s just the kind of people I’m fortunate enough to work with.  In talking to him, one of his regrets is that he didn’t have his photos backed up offsite.  He said he looked into it, but then just didn’t get around to it.  That was inspiration to get me moving…

I investigated a number of commercial solutions first, and the best I found was Carbonite.  One yearly fee to backup all your documents, music and photos (no movies), $59. That is hard to beat for those with a significant amount of photos or music.  (With the ever improving CCD imaging of digital cameras, everytime you buy a new camera, the photo files are larger. Is it a plot between the hard drive makers and the camera manufacturers? LOL) Sounds like a great deal, right?

The Carbonite app installed smoothly and ran well. It seems one key to their business model is to control bandwidth. Or, perhaps the service is very popular. They warn you that the initial backup could take several days.  Well, after more that a week, mine was still less that 50% complete.  About that time my trial period and patience both expired.  If you don’t mind leaving your computer on for a month, this still looks like a very good option.  They also have a switch in their UI where you can use less bandwidth on the upload.  This will make the backup take even longer, but will allow the kids to still watch YouTube while you are taking your backups. Lastly, they have a web UI where your can explore your backed up files from anywhere. It’s a viable solution IMHO.

I started looking at other Cloud Backup solutions for my Mac (not the kind of cloud backup they get in Kentucky where it just seems to rain all the time). Amazon S3 seemed like the natural next choice to investigate, but what is it going to cost? Looking at S3 pricing, currently it runs:

Standard Storage Reduced Redundancy Storage
First 1 TB / month $0.140 per GB $0.093 per GB
Next 49 TB / month $0.125 per GB $0.083 per GB
Next 450 TB / month $0.110 per GB $0.073 per GB
Next 500 TB / month $0.095 per GB $0.063 per GB
Next 4000 TB / month $0.080 per GB $0.053 per GB
Over 5000 TB / month $0.055 per GB $0.037 per GB


So, this is more than Carbonite at my data volume, but more reasonable at the “reduced redundancy” pricing.  Reduced redundancy is perfect for my use case since I backup all my files to an external hard drive already and this really is a disaster recovery scenario.  So for me, this will run around $84 dollars a year.  Still expensive, but S3 prices also go down at least twice a year historically.  We’ll see how it works out. At the very least, it’s cool.

Another option worth considering is Amazon’s new “Cloud Drive“.  The prices are lower than S3, with 5Gb free and other tiers at $10/Gb per year.  The tools are a little clunky right now as it is really aimed at working with music.  If you are mostly worried about backing up music, Cloud Drive makes it completely simple with their music upload and streaming tools.  For other file types its a little more manual.  But, the price it right.

Back to exploring S3.  First, we need to check out tools available for managing S3.  At this point, I was feeling very cheap since the storage costs are a little more than I wanted in the first place.  There are some good tools out there like jungle disk that would likely make this much easier, but I was looking for cheap as opposed to easy.  With jungle disk, you could take the complexity of the rest of this solution down considerably.

First step is to go to Amazon and create an Amazon Web Services account.  You probably already have an account and you can use the same login.  Then login to Amazon Web Services and create an S3 bucket.

For syncing files to S3, I found an attractive free option in s3sync, a Ruby gem that gives us a command line way to sync between my Mac and S3.  Here’s a great blog entry on the Ruby gem installation and config, so I won’t repeat that part.  Then, to backup your files, use a command similar to this:

s3sync -r -v /Users/YOURUSERNAME/Pictures/iPhoto\ Library/Originals/2011 yourbucket:iPhotoBackup/Originals

The above will copy the photos out of iPhoto on your Mac that were taken this year (2011) into your bucket in the folder Originals. You’ll need to create the folder structure iPhotoBackup/Originals before executing this command.  You could also leave off the “/2011″ and the /”Originals” like this to back up your entire iPhoto library, but this is going to take a very long time to upload to S3:

s3sync -r -v /Users/YOURUSERNAME/Pictures/iPhoto\ Library/Originals yourbucket:iPhotoBackup

With the -v option you see each file listed as it is uploaded.  Like Carbonite, this will also take quite a while, and during the upload, a lot of your internet bandwidth will be consumed such that Netflix on demand, web browsing, etc will be slow for everyone in the house. Not surprising, just thought I’d throw that out there.  This is a good reason to do it directory by directory perhaps overnight until you have it all complete.

The net step is very important to save you $$$s.  You need to go to your Amazon Webservices Console, explore your S3 bucket, right click on the folder you just uploaded, and select Properties (or select Properties button at top right).  From there, you need to select “Reduced Redundancy” and Save.  This will then iterate through all the items in the bucket and mark them for reduced redundancy.  There is no way to select this as the default for all files uploaded to a bucket.  Hmmmmm, I wonder why?  Greedy a bit Amazon?

If you are a Windows user, you may want to check out Cloudberry Explorer.  They have a nice S3 interface that supposedly can mark each file for reduced redundancy after uploading for you.  Looks like an interesting option.

There is quite a bit more to know about S3 than contained in this blog.  For example, you can make selected files or folders public and hand out URLs, etc.  Also, Amazon doesn’t charge you for transfer bandwidth on the upload, but does on the download.  There are many other considerations to think through in choosing a cloud backup solution that is right for you, but hopefully you find this informative and useful.


Agile Software and Cloud Computing and Leadership and Teams31 Aug 2011 08:30 pm

Reaching the Summit

Today was an especially good day.  Few things can compare to seeing the combined efforts of a large team who has worked so hard together towards a shared vision finally reach that goal.  Like hikers on a long trek finally reaching the summit, today we took time to survey the terrain we conquered, thought a little about the road ahead, but still enjoyed the moment.  Building SaaS based products with Agile process can result in a relentless pace, so these moments are special.

I’ve had the pleasure of leading terrific teams of very talented people in delivery of software projects, and this day, I have the privilege of leading the best of those.  The scale of innovation, integration, and imagination that as a team they delivered is a tribute to their commitment to our company and our clients.  For me, it is a huge thrill to see what months ago was a set of ideas turned into a quality solution.

Today, in our team meeting, I tried to convey the importance of this accomplishment.  Ten years from now, each of us may look back on this day, proud and maybe a little amazed of what we did together.  There’s a very good chance many of us will never deliver so much innovation in one day again on this scale.  I’m proud of our team, grateful for their efforts, and hopeful many non-profits will benefit from their hard work.

Cloud Computing and Tech News08 Dec 2010 10:47 am

Bernard 9:20 Marc Benioff has just taken the stage – quite a showman he is. Nice video overview of day 1. Benioff now talking about Microsoft as the evil empire – says they are trying to stop chatter, the sales cloud, and his new socks. Talking now about the Microsoft lawsuit – and Micosoft “protest” outside yesterday. (MS had seqway drivers out front with “Don get Forced” slogans.)

Benioff asks the crowd to help him get this customer back and ask Bernard to come on out…  Its the guy in the MS advertisements – on the trucks and seqways.  “Bernard I’m so sorry…. we want to apologize to you”  “We don’t want you to go back to software, the constant upgrades, waiting for new features…”  Crowd applauds and Bernard says he will come back.  Classic Benioff.

9:38 Benioff now talking about his donation of $100M for a children’s hospital at UCSF. Describing all the investment in a new campus at Mission Bay.  Today is UCSF day at Dreamforce. Working to raise $1.5B for the new hospital. Still waiting for the announcements.

9:52 here we go with the meat of the keynote – “your platform is too proprietary” Benioff wants to open the platform further. Adding Ruby on Rails support. “Ruby is the true language of cloud 2” (talking about speed and agility of using the language). Salesforce buying Heroku!  Wow.

Heroku will be salesforce’s seventh cloud.  Will keep Heroku as an independent team. Heroku founded in 2007 to enable fast deployment/updgrade/delivery of Ruby apps. Fancy d- emo, but no details on how this will integrate with yet.

Eighth cloud announcement: BMC Software – CEO Bob Beauchamp on stage now. Announcing Remdyforce now available on, IT configuration management.

Platform team announcements – “ 2” -going to build a killer product for each market:

appforce – departmental and collaborative applications. looks like a marketing repackaging of what already provides. don’t see anything new here.  Improvements to sharing – new sharing model to be delivered in the Spring. Configurable Visualforce pages, Reduce governor limits by 70% in Spring.

siteforce – integrated cms, point and click editor, prebuilt components, social and mobile built in, 24×7 availability. Killer UI on the new CMS for content creation and publishing.

vmforce – accenture on stage talking about vmforce. Accenture is investing in vmforce – I don’t see anything really new here and the whole thing feels like a little bit of a false start.  With the new acquisition of Heroku, I would be dubious of a large investment in vmforce.

isvforce – packaged apps on the appexchange – more marketecture.  Didn’t see anything new here.

Now have CEO and CIOs on stage from Blackboard, Belkin, Avon, Kelly Services, and Deloitte talking about how the platform has helped them.

that’s a wrap.

Cloud Computing and Tech News07 Dec 2010 11:17 am

Marc Benioff speaking now – 14,000 in attendance at the keynote, 30K registered for conference.  Emphasizing platform and database in the cloud.  Waiting to see where he is going with this, what new announcements he has in his pocket…

Salesforce serves “100,000 customers, running of 1,500 Dell PCs” Marc says. That’s a gross over simplification.  His point is that cloud computing is green because of more efficient use of resources 90% more efficient than traditional hosting.

Talking about salesforce foundation now – asked all non-profits attending to stand up and be recognized – applause.

Broad change in internet usage.  Social networking users surpassed email users last year.  Significant growth in usage via smartphones. Cloud 2 is the shift from easy fast and low cost (Cloud 1) to social and mobile (Cloud 2). Interesting that Marc positions Amazon, Google, and eBay as Cloud 1.

If half a billion people are on facebook, why aren’t we building software that looks like facebook?  Why isn’t enterprise software like facebook?  Focus on 6 clouds: chatter, jigsaw data could,,, along with sales cloud2, service cloud 2.

Major changes in will be annouced tomorow, and appforce to build apps, and site force for sites.

Demoing jigsaw – clean and update your contacts from jigsaw database, can also search for contacts at a opportunity.

CEO of Symantec on stage now. Arguing chatter is more efficient than email, claiming productivity going up. I privately wonder how this can be measured.  Is it just more information overload and more surface level communication without depth? I do get that it is useful to communicate very brief bits of info quickly.

“How do I get my whole company on chatter?” New product announcement – ChatterFree.  Whole company can be on chatter.  Great strategy – obviously trying to be the facebook of business. Admins have to enable it in the organization. Employees can be provisioned automatically or can be invited to join.  Includes chatter on mobile device.  Also, coming by end of year – free for everyone, generally available public site.

Demo of service cloud 2 – screen pop integrated with telephone system, integrated with knowledge base and a call script. Also demoed the availability of the KB via google search. Agent to web visitor chat is included in service cloud 2.  Demoed twitter integration showing how service cloud can monitor tweets and create cases, including photos and location information, and send answer via tweet. huge announcements coming tomorrow – vision of leading platform for cloud 2.0 applications – faster cloud 2 apps, including java. 185K apps today.  Big emphasis on Open – make it open. announcement: claim to have the most scalable cloud database – 25B db xactions through Q311, 200B DB records, 12B custom tables, response times falling to around 275ms.  When they released chatter,  they extended the DB model to include social data model (follow any entity) and recently published a mobile / REST APIs.

Announcing today – “first enterprise database for cloud 2” open to any language / platform.  Full relational db, full text search index, user mgmt, row level security, triggers and stored procs, authentication, APIs.  Elastic, auto-upgrade, auto-backup, auto-disaster recovery. Trusted and secure, SAS 70 Type II certified. demo: view of db instances after login. Create an db, graphical schema editor in nice UI, showing console running a select to show the db is online.  Showing VMForce java code connecting to demoing facebook app querying jobs database and uploading a resume to a open position. facebook app is a php app running on EC2. Demoing android app now pulling job data from Finally, showing recruiting app running on ipad.  Sharing and security of restricts visibility of job applications to relevant departments. Showing collaboration in interview process attached to job application using chatter like social sharing. will be available next year. first 100K records are free. $10/month per 100K records. Benioff promises even more announcements tomorrow. Will I Am and Stevie Wonder tonight.  Gnereal Powell tomorrow?  Wow.

Cloud Computing and Tech News07 Dec 2010 10:52 am

Here waiting for keynote to start…. It is at least double the conference of 2008. I estimate this room can hold 10,000 people. Will be interesting to hear what Benioff announces today.

Cloud Computing and Mobile Computing and Tech News24 Apr 2010 06:33 am

driodIt’s been a busy week as usual this time of year, but right in the middle of the week appeared my very own shiny Droid phone.  I’m still just  a little in awe that Google has offered a new Android based phone to everyone at the conference.  Wow, what an impressive display of financial clout.  I feel it is a smart move – to get the people that obviously care most about what Google is doing to get interested in building Android apps.

I’m very interested in any announcements that might be made at the conference.  I feel relatively confident that Google is working on a tablet to compete with iPad as predicted in this blog in January. (Oh, that was my last post.  Like I said it is very busy this time of year!)

Will they follow Apple’s move and push a table based on Android?  Seems like a plausible move.  With the wood they have behind Android application development, this seems the logical course.

I can’t wait to build an app for this phone.  Yep, tech bribery works.  Good job Google.

Cloud Computing and Tech News02 Apr 2009 10:19 pm

Today, Amazon announced their Amazon Elastic MapReduce cloud web service.  A natural extension to their EC2 cloud services on one hand, and a somewhat startling event on the other.  In a recent post, I spoke of the global implications of the readily available low cost cloud computing infrastructure.  Now, it seems this service has entered the realms of massively parallel computing.

Not clear at this point what the limits of this service are, but the possibilities are staggering.  Not enough computing power in Pyongyang?  No problem, run your nuke simulations right here Mr Kim Jong-il. An extreme case, and perhaps too complex / compute intensive for this offering, but the point is never have resources of this scale been so readily available.  Amazing.

A better fit for tasks like SETI@Home, these distributed networks are useful in solving very large data intensive types of problems like web indexing.  Google even provides a nice tutorial on mapreduce.  This is also a nice class lecture on mapreduce from Cal Berkeley.  Enjoy.