I’ve been pondering a lot lately on how the Docker ecosystem will evolve and what will be its impact on the larger infrastructure market. My current day job has allowed me over the past 8 months or so to focus almost 100% of my time doing R&D efforts around Docker. As a result, I’ve had a lot of time to think about these things. In my mind I see three phases in which Docker could evolve. I don’t see it obviously trending the way I envision and the purpose of this post is to attempt to articulate the subtle differences of how I think it should change and their impact.
The phases I think Docker should evolve can be described as, first, Docker as an application package. Second, Docker as a unit of orchestration, and, finally, a container cloud. While all the hype in the media is that containers will revolutionize the cloud, based on where I see it’s headed today, it won’t, at least not to my standards. But it can, and that’s what I’d like to explain. We have an incredible opportunity to completely turn the infrastructure business on its head. So stick with me here, it gets good at the end.
Docker as an Application Package
The first phase is where we are today and it’s what made Docker so popular. The basic concept of using Docker as an application package is really to put Docker on a very similar level as rpm and deb. To run nginx on your server today, you log in, “apt-get install nginx”, and then run nginx. With Docker you can now log in and run “docker run nginx” or “docker run my-nodejs-app.” Docker just becomes a more portable, easier to use rpm/deb. That terse explanation undercuts the incredible benefits Docker brings to application packaging, deployment, and management but conceptually I think it’s a very useful analogy.
At this phase of evolution the impact on the larger infrastructure space is small. Basically, at this level of maturity you can expect Docker to become a very popular DevOps tool that has a similar impact as something like Puppet and Chef. Puppet and Chef are great tools and do enable you to run infrastructure better, but the impact of those tools can’t be compared to something like AWS, which was an absolute game changer.
The talks of containers changing the world, impacting AWS stronghold on the IaaS market, and other grandiose statements will not be obtained if we don’t evolve past this simple use case. The reason is quite basic. If Docker remain as a means of packaging, you are running the same infrastructure you are today, but just packaging the applications in a different manner. One argument that people make is that you’ll see a trend towards bare metal now because of Docker. AWS enabled a paradigm shift where the unit of deployment became the server. With Docker, you’re now able to move back up the stack and make the unit of deployment the container with even more agility than what AWS originally enabled. This means you aren’t creating and destroying servers as often and from this perspective buying bare metal should be more attractive.
If you are running Docker on AWS today you are most likely using some aspect of EBS, security groups, ELB, and maybe some VPC functionality. Therein lies the problem. With containers you still need persistent storage and snapshots, load balancing, firewalling, and other random things. When you move to bare metal, you need to again solve all these problems. There’s nothing magical about applications in Docker that would make them suddenly negate the need for something like EBS. If your application today uses EBS, then your application in Docker will still need EBS. In practice, what I think will happen is that people will just move to larger more static instances on AWS. To a certain degree this could help AWS because most likely the small t1, t2, m1 instances have the lowest margins.
Just because you can package an application in a better fashion doesn’t mean you’re going to change the face of the cloud.
The next level of evolution that I think Docker should enter is that of orchestration. Before I address what I mean by orchestration, I’m going to talk first about the last phase, which is a container cloud. I want to talk about this first because this seems to be the logical next step for most people. Most higher level orchestration systems are really about building a container cloud.
A container cloud can be described easily by saying the level of abstraction becomes the container. In IaaS today, the level of abstraction is the virtual machine (VM). The VM runs on some physical hardware, but as the consumer of the cloud you have no visibility or access to the physical hardware. This is what makes the cloud a cloud. It just runs somewhere and you shouldn’t need to concern yourself with that. In a container cloud, similar to the VM, the container runs on something and you have no visibility/access to the underlying platform.
Most Docker orchestration tools I see today gravitate conceptually towards this model and maybe one day we’ll get there. Right now, I think there are some very practical issues in going towards this model. The first is security, but that is less of a concern to me. The second is a general question of usability and the appeal to the user.
Regarding security, as it stands today, Docker is not secure for multi-tenant environments. This is well known. If you wish to run multi-tenancy you have three basic options. First option is to run a restricted set of Docker functionality. This means not allowing root, controlling the image, and probably adding more SELinux rules. If you wish to run a PaaS or a specific use case, these restrictions may be just fine for you. This is essentially how Heroku can run container technology today in a multi-tenant environment. The second option is to use physical machines or virtual machines as your security boundary. When a customer goes to deploy a container, they buy a 1GB bucket, and that bucket ends up being a 1GB VM running somewhere. If you’re actually spinning up VMs, you’re then tying a group of containers to a specific host. I don’t think that’s really what the user wants. The third option is to just ignore it and assume somebody will figure it out. Option three is quite common.
Assuming security will probably be fixed or maybe the current security is sufficient for your needs, the much, much larger issue is the general usability of a container cloud. There’s an on going debate of whether you should run SSH in your container. The “right” answer is no, you should not. The practical answer is often “yes, I need to.” This basically sums up the issues of a container cloud. A container should really be a process. If I now have a running process somewhere in the cloud, how do I do runtime introspection of that process and provide other services like syslog, cron, monitoring, etc.
If you don’t give access to the host system, you need to either run all the tools in your container, or the container cloud must build a large suite of tools to provide all the runtime introspection you need. If you run all your tools in the container, suddenly your containers are almost as fat as a VM. This really shouldn’t be the goal of the community.
If you build a large collection of tools so that the users of your cloud can introspect their container, there’s a very good chance that you’ll irritate the same audience of users that you want to attract. It’s the same conundrum that plagues PaaS. PaaS targets developer so that they don’t need to worry about all the underlying details of running their app in production but often by doing this they restrict what the developer can do and that makes that PaaS less attractive. DevOps are the crowd today that loves Docker. There’s a very good chance that the DevOps people will not like the tools provided by the “container cloud” to do introspection. Instead they will want to just log into the host and run the tools they want. In the end, the success of the “container cloud” will probably require extending the audience past that of the DevOps crowd.
It will take quite a bit of time to overcome the issue that I’ve outlined. Assuming we do overcome them, what will be the impact on AWS and others? I venture to say almost nothing. You can imagine in this container cloud model you will now go to a “container cloud” provider or maybe you can run your own container cloud with some open source Docker orchestration tool.
The “big three” cloud providers today (AWS, GCE, Azure) will be the best suited to provide this “container cloud.” You still need to run large quantities of physical hardware, which they already have. Besides the basic needs of hardware, the other issue is the orchestration itself. As I mentioned earlier, if you’re using Docker today on AWS, there’s a very good chance you’re leveraging EBS, ELB, or VPC functionality. Containers still need storage and networking functionality and that needs to be orchestrated in some fashion. It’s slightly different with containers, but it still exists. Either you run your Docker orchestration tool on AWS or you build all the orchestration of storage and networking separately. If OpenStack has taught us anything, it is that these orchestration tools are hard to build. AWS, GCE, Azure have a huge advantage here. Orchestrating a VM or orchestrating a container carries with it all the same basic issues and those issues they know how to deal with. OpenStack has failed to put a dent in AWS and many can argue that it has in fact helped them (or more specifically helped GCE and Azure). Building an open source container container platform to replace AWS in the model I’ve just described, will largely follow the same suit. The trend I’m seeing is that these container cloud systems will in fact just be built on top of AWS.
At the end of the day, given what I’m currently observing, if we do ever get to the model of a full container cloud, that cloud will probably be running on AWS. Yet again, we fail to really impact AWS.
Orchestration as a Service
From my perspective, given the current trends of how I see the Docker community playing out, there will not be a mass exodus from AWS. But there could be and let me explain how.
We need to find the happy compromise between the application packaging model and the container cloud; one that pulls the power away from AWS. Let me present to you a very, very subtle change to the “container cloud” model I’ve presented thus far. So subtle you’ll probably say, “duh, that’s what everyone is already doing.” but I’ll show you why that’s not true. There are two key parts to this. First, the level of abstraction is not the container, but instead, the user still has access to the underlying server albeit physical or virtual. Second, you support a “bring your own server” model. I think the easiest way to describe this model is to walk through a hypothetical user experience.
A user goes to my-awesome-container-orchestrator.com and creates an account. They then go to AWS, SoftLayer, GCE, etc and gets a physical server or VM and registers that server with my-awesome-container-orchestrator.com. Once that server is registered they can now deploy containers on that server, manage the docker volumes, snapshot them, back them up, move volumes around, dynamically link containers across servers, dynamically map exposed port, manage service discovery, setup security groups on the server, place containers on a private L2 with custom subnetting, add load balancing, add metrics and monitoring, add autoscaling, etc. Since the user owns this server, they can still log in. If they wish to run lsof or strace or install additional monitoring tools they can.
With this model we are basically providing Orchestration as a Service (OaaS). Since all that containers require is Linux, you really don’t care about the underlying hardware. Running CoreoOS on AWS, GCE, a physical server in SoftLayer, some colo, or your basement is basically the same. With OaaS you basically create one gigantic cloud comprising all the servers in the world. This basic idea was tried in the past with VMs, but it fell apart because there is no portability between clouds today. Docker and Linux now make this possible.
The scenario I just describe may seem obvious and fall in line with much of the hype in the media of how Docker will change the world. There’s two subtle, yet very important differences in the model I’m envisioning. The first has to do with how we build these orchestration systems, the second is regarding how we treat storage and networking.
Decouple Orchestration from Infrastructure
First, orchestration systems need to be decoupled from the ownership of the underlying infrastructure. This is required for “bring your own server” model to work. The model that has always been done in the past is that hardware and orchestration come together. There is no such thing as “OpenStack as a Service” that is not bundled with a specific hardware offering. The nature of VMs and IaaS makes this almost impossible. Setting up a virtual machine cloud is very specific to the hardware environment. With Docker and containers we move up the stack from having a hardware dependency to a Linux dependency. This means Orchestration as a Service is actually feasible without requiring any specific hardware environment beyond your standard x86_64 server.
Most orchestration systems being built today have the fundamental assumption that the infrastructure owner is the also owner of the orchestration system. Whoever owns the VM or server running the containers is also in charge of managing the orchestration system. By own, I mean control. Some systems I’ve seen allow you to enter your AWS creds, but the VM that they deploy on AWS is a black box to you. This is problematic. The first problem is that you are locking out the user from the host. Going back to the issues I described about annoying DevOps people, I think the most practical solution is to give access to the host. Second issue is if you’re giving your AWS creds to some orchestration system, that means that that system is specific to AWS or whatever cloud provider. You need to make it such that the user can obtain the server and register it. This opens the possibility of getting infrastructure from anywhere.
It’s is actually quite difficult to built an orchestration/IaaS system that assumes the host could be malicious. There’s a lot of things to consider.
Storage and Networking
The second point is how we implement storage and networking. The great thing about Docker is all you need is Linux. Docker runs wherever Linux runs and that is how Docker achieves its incredible portability. This high degree of portability is what has sparked this notion that Docker will transform the cloud. The three basic building blocks of IaaS are compute, networking, and storage. Docker basically is a compute technology. As I keep mentioning, what will keep people tied to AWS is largely the storage and networking services that they provide. Docker achieves its high portability because it only needs Linux. For storage and networking, we can do that same basic approach.
Linux and the software that runs on Linux has all the raw technologies needed to provide the same functionality that EBS, ELB, and VPC provide. If you look at Linux hypervisors today, the hypervisor largely takes care of virtualizing the CPU and giving access to storage and networking. The actual implementation of the storage and networking is provided by Linux technologies (OVS, GRE, VXLAN, bridging, ip/eb/arptables, dnsmasq, qcow2, vhd, iscsi, nfs, etc). Even if you consider iSCSI or NFS, the actual implementation of the iSCSI or NFS server can be Linux. What is missing, to truly provide functionality on par with EBS, ELB, and VPC, is the complex orchestration to piece together all of the individual raw technologies.
Lets reimagine the cloud and start with the basic assumption that all I have available to me are empty Linux servers that have basic L3 connectivity and additional block devices for storage. If I add a high degree of orchestration I can take those basic building blocks and construct a full cloud with functionality on par with AWS.
I don’t know if I can emphasize this point enough. All you need is Linux and you can get Linux from anywhere. This completely removes the stronghold AWS has on infrastructure. It is drastically cheaper to run physical servers. The value that AWS provides today is the additional services of EBS, ELB, VPC, etc. With containers you still need those services. But if Linux can provide the raw technology needed for that functionality and the orchestration of that technology is provided as a service, you are then free to chose whatever infrastructure provider you want. There’s almost no reason to use AWS anymore. Use some local colo company. Then use some other colo in Singapore. You don’t need to go with one company that has a global footprint. Now any data center provider can compete with AWS. Throw in a global CDN like CloudFlare and you have an incredibly complete picture at a fraction of the cost of AWS. This approach is basically going to encourage small physical hosting companies to pop up that can operate at a lower margin than AWS can (unless they just lose money like Amazon seems to be fine with).
I’m sure many are thinking that there are use cases in which you really do need some non-Linux technology, like a hardware load balancer or an expensive SAN. First, I would say the vast majority of use cases can be solved with Linux technologies and the performance is acceptable. There are always cases where people need more. The model I’ve describe is better suited to handle this requirements. For example, imagine that for whatever reason you really want to run NetApp. In AWS, if you don’t like what they have, you’re out of luck. In the model I’ve described, since the user is supplying the infrastructure you are free to install NetApp in the same rack and use it how you choose.
I hope you understand that if we just slightly tweak our perspective we can have a huge impact. If we continue on the path I see today, I think we will just further enable the domination of the current cloud providers. The ideas I’ve presented are very complex to implement but this is what I do and this is what I spend most my time thinking about. I know it can be built, no doubt.