Cluster Management using Apache Mesos, Marathon

Kailash Verma
5 min readApr 18, 2017

Cluster Management using Mesos as cluster manager, Marathon as its framework.

Before going into what is cluster management , what is task scheduling lets see few things like the problems before Mesos, static partitioning , dynamic partitioning etc.

Static Partitioning :

Static Partitioning for different applications.

In the above mentioned diagram we have a racks of servers, and we can see fixed servers are assigned to different applications MySQL, Cassandra, Rails, Hadoop, Memcached.

Now lets take a scenario that the Hadoop servers are fully utilized and more servers are needed for Hadoop. Lets assume that other servers are under utilized, then also we can not assign the other existing servers to Hadoop as they are assigned to other applications.

What are the disadvantages here?

  • Memory Utilization : We are not fully utilizing our existing servers.
  • Higher cost

Problems before Mesos ?

•How to manage Modern apps are distributed? Of course manually and with Ops team.

•Static partitioning of cluster, leads to Low utilization [ 30%].

•Developers — challenge to build apps that scale elastically and also handle faults that are inevitable.

•Ops — have to manage and scale all Apps individually

  • How to manage NFR (Non Functional Requirenment)— out of the Box.

Types of Resource utilization :

  • Intra machine resource sharing : Share a single machine’s resources between multiple apps (multi-tenancy).
  • Intra-datacenter resource sharing : Share multiple machine’s resources between multiple apps.

What is a Cluster Manager?

  • provides a level-of-indirection between Hardware resources (machines) and Applications/Jobs
  • Piece of software doing the mediation
  • Humans will not manually schedule Apps/Jobs on your cluster nodes.

What is Mesos?

Apache Mesos is an open-source cluster manager

•developed at the University of California, Berkeley.

•It “provides efficient resource isolation and sharing across distributed applications, or frameworks”.

•The software enables resource sharing in a fine-grained manner, improving cluster utilization.

 Mesos the Cluster Manager :

Dynamic Partitioning Using Mesos for all applications.

In the above diagram, we have racks of servers and applications. But what is the difference between this diagram and previous one ? Here we have a common servers that are being utilized by all application as per there requirement. So here if Hadoop needs more servers, it can utilize the existing servers as per availability and we achieve maximum memory utilization and reduce the cost.

Components of Mesos :

  • Modern general purpose Cluster manager:

Mesos Master

Mesos Slave

Mesos — Design philosophy :

  • Kernel — does the resource allocation and sharing
  • Frameworks — task scheduling, execution & Fault tolerance

Mesos is like an OS/Datacenter kernel :

  • User space scheduling
  • Level of Abstraction
  • Builds and runs distributed systems using Resources
  • Build & run a PAAS on top of Mesos kernel
  • Run Mesos on top of Physical machines or EC2 or OpenStack [IAAS]

Solution Architecture : (The most important part of this page)

In this architecture we will be using Mesos as cluster manager and Marathon as its task scheduler.

Solution Architecture for Mesos using Marathon as its scheduler.

Description :

  1. Mesos slaves reports resources availability (offers) to master.
  2. Mesos master send these offers to marathon framework.
  3. Framework scheduler replies to the master with information about tasks to run on slave using required CPUs and RAMs for different tasks.
  4. The Mesos master sends the tasks details to load balancer, which distributes the task to different slaves.
  5. Finally the task is used to send to slaves which allocate resources to framework, which launches to tasks.
  6. Resource will be free after task completion.

Mesos — functionality for every Distributed System :

Mesos master provides common functionality which every Distributed systems requires :

  • Failure detection
  • Package distribution
  • Resource isolation
  • Task distribution
  • Task starting
  • Task monitoring
  • Task killing
  • Task cleanup

Master — The kernel :

  • Allocating resources to different frameworks
  • Flexibility — accommodate diverse frameworks
  • Scalability- scheduler can scale as number of machines & apps increases
  • Fairness — in allocating resources to users/frameworks
  • Fine grained resource sharing using resource offers
  • Manages the Task lifecycle for frameworks
  • An offer represents some resources available on slave.

Slave :

  • Have resources
  • Responsible for executing tasks — assigned by Frameworks
  • Isolation for each Task
  • Each Task should get the exact resources — not more or less
  • Master manages resources on Slaves
  • Resources of Slaves are consumed by Tasks
  • Slave resources are managed by Master and allocated to Frameworks
  • Slave will send offers to Mesos Master along with Key-Value pairs [ attributes ] of the Slave
  • Frameworks uses the Slave attributes in Task management

Resources — offers :

  • Resource attributes — cpus, mem, disk, ports
  • Mem and disk are in MB

Frameworks (Marathon):

  • Distributed apps which run on Mesos cluster are called Frameworks
  • 2 components : Scheduler & Executor
  • Runs number of Tasks
  • Tasks consume resources
  • Tasks Lifecycle & management — functions
  • Schedulers- coordinating the execution
  • Executors — control the Task execution, run multiple tasks
  • Provides API to communicate with Scheduler and Executor

Scheduler :

  • What Computation should run?
  • Where should Computation run?
  • Talks to Master
  • Master responsible for allocation of Resources

Executer :

  • Start the task on the scheduled slave.

Mesos — value addition :

  • Provides a Data center kernel
  • High level abstraction to develop apps that treats Distributed Infra, just like a Single large computer
  • Devs can only focus on App logic & not worry about Infrastructure
  • Helps in resource allocation, deployment, monitoring and isolation
  • Devs need to know what Resources are needed and not how to get resources

Apache Mesos Installation :

Please refer the link mentioned below to install Mesos.



--

--

Kailash Verma

DevOps Consultant | Cloud Engineer | Security | CI/CD | HA | AWS | Docker | Kubernetes | Aerospike | Cassandra | Rabbitmq | Consul | MongoDB