FUD with LinkedIn Passwords

I think some of the FUD that the BBC are spreading on the LinkedIn security breach needs addressing.

Don’t change your LinkedIn password … yet
Sure change your LinkedIn password but the important thing is to change it once the breach has been found and secured otherwise its a bit of a pointless exercise.

Your password hasn’t been leaked
Someone has managed to get hold of 6 million hashed passwords. Having a hash of your password is not the same thing as having your password. Don’t worry too much about the hash of your password being known (assuming your password isn’t dictionary based), it will take several years (if not longer) to workout what all the passwords are for 6 million people from just the password hashes. It’s worth keeping in mind that your encrypted password is sent over the Internet every time you log on to a website, getting hold of the data that you send over the Internet isn’t hard, that’s why it’s often encrypted.

The lessons

  • Use different passwords for different sites. If that sounds like too much effort then at minimum you should use a unique password for your Internet banking, email and any site that holds your credit card details.
  • Don’t use dictionary based passwords because the hashes of these are well known.

Web Application Development with Elastic Beanstalk

Ok so you’ve had a brilliant idea and now you are going to build your Web Application on Amazon Web Services (AWS). Before you spend any time designing a Web Server platform stop and consider does Elastic Beanstalk already have everything you need?

Creating a Web Server Platform is Hard

Creating a reliable and scalable Web server platform is hard; as a result people often take an iterative approach to it.

Often the first step is to launch an EC2 instance, install all the third-party server software needed, configure it and then deploy your new application.

In my experience this is as far as a lot of people get but this is not a reliable solution because there is no redundancy. Ideally when the web server fails another one should be ready to take its place, to do this you could run an extra web server instance in another Availability Zone and put both of your web servers behind an Elastic Load Balancer.

For most business in a traditional hosting environment it wouldn’t be conceivable to think about creating multiple servers in multiple data centres. The cost would simply be prohibitive.

Ok so now you’ve got redundancy but what if you build the next twitter? Well you can always scale up or scale out when you need to, but this will be a painful experience because in all likelihood you won’t know that you need to scale until it’s too late. What if sometimes you have thousands of visitors and sometimes you have none? The best way to deal with this situation would be to use Auto Scaling to scale out on demand, when the number of visitors is high you will be running many instances, when the number of visitors is low you will only be running a few instances.

You are also going to need to create a development environment for your developers as you can’t really let them develop directly onto the live servers. It is also useful to create a staging environment which has the next release of your web application for stakeholders to preview.

All of this is great until the day comes when you need to deploy the next version of your web application. You are going to have to revisit how your Auto Scaling works, you could create an Amazon Machine Image (AMI) which has the all the third-party software, configuration and the new version of your software on it. However this isn’t a very sustainable approach so you are probably going to have to look at Puppet or a similar configuration management tool to automatically configure instances that are launched by the Auto Scaling. You will also have to do something to deploy the correct version of your web application to each web server launched by the Auto Scalar. The correct version of your web application will depend on which environment the web server instance is configured for (Development, Staging or Live).

Once you have done all of this you will have three environments with a dynamic number of EC2 Instances. That’s a lot of configuration and a lot of moving parts to manage. At this point you are probably quite averse to change because it has taken so long to get this far. You might even be forgiven for ignoring a few security vulnerabilities in your application stack because you simply haven’t got the time to go through the process of creating a new patched AMI for all of your environments.

If all this seems complicated, then I have bad news for you, I have not described everything you need to do, in reality there is probably a lot more that you need to consider.

AWS gives you all the tools you need to do these things but you need to stop and ask yourself whether creating a web server platform using Infrastructure as a service (IaaS) is worthwhile.

Elastic Beanstalk is the Answer to Platform Complexity!

Elastic Beanstalk was created to save you the effort of building your own web server platform. You don’t need to worry about creating a web server AMI and keeping it up to date with the latest security patches because Elastic Beanstalk already does this for you.

You can also forget about scaling and redundancy issues because these are built into to Elastic beanstalk you just need to set the minimum and maximum number of web server instances that you wish to run and the metric that you wish to scale on (CPU, Bandwidth, etc.).

Creating development, staging and live environments is very simple, each application can have multiple environments and each environment can have a different configuration. You can set environment variables, such as database connection strings, which are exposed to each instance that is created in that environment. For example, web server instances in your development environment might connect to your development database and web server instances in your live environment might connect to your live database.

Versioning is also taken care of with Elastic Beanstalk. You can specify the version of your web application for each environment. Elastic Beanstalk will make sure that the specified version is deployed all to web server instances in that environment.

What’s missing?

As I mentioned previously, in my post about Amazon’s Simple Email Service (SES), I would really like to see AWS offer the ability to configure the Elastic Beanstalk environment with SES SMTP credentials. This would mean that the local SMTP server on each instance in your Elastic Beanstalk environment would have a Smart Host that was pointing to the SES server. This would remove the need to create code in your application that locks you into SES.

Getting HTTP Logs for analysis is tricky with Elastic Beanstalk, you either have to roll your own solution or you have to solely rely on page tagging services like Google Analytics. I’d like to see HTTP and HTTPS Logs for Elastic Load Balancers which would remove the need to create your own solution in Elastic Beanstalk.

Elastic Beanstalk currently offers Java, PHP and .NET software stacks. This is sensible because these are the stacks that most people are using but I suspect a lot of people would like to see other software stacks supported in the future.

Conclusion

Infrastructure is no longer a problem you have to worry about for new web applications. All new web applications should be developed for Platform as a Service (PaaS) offerings like Elastic Beanstalk, legacy applications that are not currently hosted in the cloud can easily be ported to IaaS as a first step towards PaaS.

A lot of people worry that creating new applications for PaaS offerings such as Elastic Beanstalk creates vendor lock in for your application. This needn’t be the case; if you have designed your application correctly then you won’t have locked yourself in, all the components of Elastic Beanstalk are simply standard pieces of software that you are already using. You could create your own Elastic Beanstalk service anywhere you like, but it would be a strange thing to do because it would take time and it would be expensive.

There is an argument that PaaS doesn’t work because there isn’t a single one size fits all solution to all problems. AWS has a neat answer to this; they let you create your own AMI’s for Elastic Beanstalk to use. However I would argue that progress in computing has always been measured in the flexibility has been taken away, this is a good thing and not something we need to fight. Someone else has figured out all the best practices of running a web application on a platform, all we need to do is use it. We don’t consider writing our own programming language, Web Server, Application Server or MVC Framework every time we develop a new Web Application so why should we consider creating our own platform?

Ultimately PaaS offerings like Elastic Beanstalk are the future because they allow businesses to focus on building web applications and not get distracted by infrastructure issues; it is the web application you create which differentiates your business from the competition not the platform you run on.

Get your Emails read by using Amazon’s Simple Email Service

If you are using Amazon Web Services (AWS) to host a Web Application that sends Emails then you almost certainly need to be using Simple Email Service (SES).

Your IP address might have a dirty history

Public IP addresses (of the IPv4 variety) are a scarce resource and AWS has to recycle them. EC2 instances are allocated a public IP address on start up which will stay the same until the instance “is stopped, terminated or replaced with an Elastic IP address“.

On any hosting platform, cloud or traditional, IP addresses tend to pass through a lot of hands before they reach you and in some cases the IP address that you have been allocated has been used for sending spam emails and is therefore on someone’s blacklist.

If you are not sending emails from your EC2 Instance to the outside world then this isn’t a problem. However if you wish to send emails from your instance then you will need to check your instance’s public IP address on services like Spamhaus, Trend Mirco and the Composite Blocking List (CBL). This isn’t an exhaustive list of spam blacklisting services, in all likelihood you will come across many others.

Getting your instance’s public IP address de-listed from a spam blacklist can be a challenging experience. Trend Micro for instance require that your IP address is marked as a static IP address before they remove it from their blacklist, to do this you will need to allocate an Elastic IP Address to your instance and then configure the reverse DNS for that IP address.

You will need to check the history of every public IP address that you want to send emails from. If the number of instances that you are operating is changing dynamically using Auto Scaling then this could be a challenging task. One solution might be create a single instance as your public SMTP server and then configure all your other instances to use that as a smart host.

Lastly if you are sending a substantial number of emails from your EC2 instance then you will need to apply for your email sending limits to be removed.

It’s possible to send emails from an EC2 instance without fearing that it will be marked as spam, AWS give you all the tools that you need, but it might just be more effort than it’s worth.

Simple Email Service to the Rescue!

Simple Email Service (SES) saves you the effort of creating your own email platform for your applications. You don’t need to worry about keeping IP address off blacklists because AWS takes care of that for you. You also don’t need to worry about availability, SES runs on multiple availability zones meaning that you are very unlikely to ever experience a total failure. SES is scalable, assuming that you are trustworthy, you can send tens of thousands of emails through SES very rapidly. And best of all SES is probably going to cost you next to nothing.

You can connect to SES using an SMTP Interface or through an API that can be used from most languages such as Java, .NET, PHP, Ruby, Etc. It is likely that your existing application uses SMTP so the SMTP Interface is probably going to be easiest to implement to begin with. If you do choose the SMTP route you may want to configure your existing email server as a smart host to SES because you can abstract the process of authentication to SES away from your application which means that you don’t need to change the code of your application.

What’s Missing?

I’m convinced that Platform as a Service offerings like Elastic Beanstalk are the future. So I would really like to see AWS offer the ability to configure the Elastic Beanstalk environment with SES SMTP credentials. This would mean that the local SMTP server on each instance in your Elastic Beanstalk environment would have a Smart Host that was pointing to the SES server. It’s worth pointing out that this can be achieved today manually by creating your own custom AMI for Elastic Beanstalk.

In Conclusion

I’d like to stress again that IP addresses having a dirty history is true of any hosting platform, I just highlight the problem here because AWS provide you with SES as simple solution to do something about it.

Basically it is likely that a team of developers and support engineers can make a better job of creating an email platform than you can.

If you are currently using AWS then SES is probably the logical choice and even if you are not using AWS today you should strongly consider SES for sending Emails from your Web Applications.

Please feel free to share your thoughts with me.

Amazon Web Services – The Missing Features

I like Amazon Web Services (AWS), the rate of change and improvement is staggering but there are still things that they can do to make it even easier for people to create cheap and highly available systems.

HTTP and HTTPS Logs for Elastic Load Balancers

If you have multiple Web Servers behind an Elastic Load Balancer then you have a problem when it comes to Log file analysis. Each server is going to create its own set of logs, in order to analyse them you are going to have to merge your separate logs together using something like logresolvemerge from AWStats. If you are using Auto Scaling you have the additional problem that servers can disappear depending on load.

There are several solutions to this problem such as using a third party logging service like Loggly or by solely relying on page tagging services like Google Analytics but any solution you create is going to add complexity to your architecture which will require maintenance and increase costs.

Load Balancer with HTTP and HTTPS Logging Concept

How the Load Balancer set up page could look with logging

I’d like to see AWS create a new feature which allows you to log all HTTP and HTTPS traffic that passes through a load balancer. It should allow you to specify an S3 Bucket and target prefix just like S3 Logging currently allows. It would make sense if the log format was the same as the S3 logs but in the future I’d like to see IIS and Apache log formats supported. Logging on the load balancer would reduce a large amount of complexity in a load balancing system. It would also allow for 3rd party developers to create analytic tools that could be purchased on the AWS marketplace.

Elastic Beanstalk in the EU region

Elastic Beanstalk and other app engines are the future. One day, very soon, no one will need to be a system architect. But that day is a little while off and part of the reason for that is that Elastic Beanstalk isn’t available in the EU region. This is a problem for companies that have regulatory compliance issues which mean that data can’t leave specific regions. The simple solution would be for AWS to make Elastic Beanstalk available in more regions!

More Regions

AWS opened my eyes to working with one hosting provider across multiple regions. On many projects I’ve worked on I’ve had to solve latency issues or had regulatory issues to comply with. Having one provider for EU, US, etc is fantastic but now I’ve seen how great it is i want more! Australia, New Zealand and Canada would be a great start because they are already developed markets.  I suspect I won’t have to wait to long for new regions.

S3 bucket sizes

S3 is great, so easy to use that most companies seem to be storing an ever increasing amount of stuff on there, log files, backups, static media are all good use cases. However there is a price (all be it small) for all this data it would be great to get a quick over view of the total number of objects and the total size of a bucket from the console.

Route 53 to Support S3 Hosted Sites on the Root Domain

At the moment there is a movement towards (or backwards if you are old enough to remember the early days of the web) creating static websites using tools like Octopress. Static sites often make a lot of sense, there is normally very little on the average Blog that really needs to be dynamic.  By hosting a static site on S3 you can avoid instances costs and create a highly scalable website without any pain.  You could easily create a static site that was viewed by millions of people a day for a running cost of a few dollars a month.

The only catch is that Route 53 doesn’t allow you to host sites on the root domain using S3, this means you are forced to use a sub domain prefix such as www.  You might not agree with no-www.org but you should probably do something with your root domain.

Werner Vogels is using a redirect for his site All Things Distributed so if you enter allthingsdistributed.com it will redirect you to the S3 hosted site at www.allthingsdistributed.com.  This is fine but it doesn’t feel like an elegant solution.

I’d like to see Route 53 allow S3 hosted sites on the root domain.

Auditing for Identity and Access Management (IAM)

IAM allows you to control what level of access to your AWS resources your users have. For instance you can specify that users have the ability to restart instances but deny them the ability to terminate instances.

IAM is great for controlling access but it doesn’t allow you to answer the questions who did what and when did they do it.  Highly available systems are fine but they can still be brought down by employees making mistakes, having an audit trail will often help identify training issues.

Open Source NoSQL Service

Most of Amazons services, such as ElastiCache and Elastic MapReduce, have mature open source equivalents which many companies are already using.  For some reason (I suspect political) AWS chose not to take this route when creating a NoSQL Service and instead created DynamoDB which has no Open Source equivalent. Not having an Open source equivalent is possibly limiting the adoption of DynamoDB because System Architects are wary of tying themselves to a single vendor.  It would be great to see something like MongoDB or CouchDB available as a service from AWS.  Ideally AWS would create a NoSQL Service like their own Relational Database Service which allows for multiple different engines.

More RDS Database Engines

I’d love to see PostgreSQL available as an RDS Database Engine option.  In the last 12 months Oracle and Microsoft SQL Server have been added which is great and indicates that over time we can expect to see more relational databases made available as a service.

A Roadmap

At numerous times I have seen a gap in the AWS offering and I have spent time creating my own solution only to find that AWS launch their own solution a few weeks later! This is great because it shows the fantastic rate of progress that AWS are making and that they are listening to their customers but I have also wasted several weeks on my life creating solutions that are redundant!

I would love a roadmap, maybe giving a preview of the features that are coming in the next month.

In Conclusion

I hope you have found this post interesting, I’d love to hear your thoughts and experiences of using AWS.