4 minute read
During the CTF, we don’t want players to be capable of manipulating the infrastructure at their will: starting instances are costful, require computational capabilities, etc. It is mandatory to control this while providing the players the power to manipulate their instances at their own will: it is a sweet spot, which might not be so easy to find.
For this reason, one goal of the chall-manager is to provide ephemeral (or not) scenarios. Ephemeral imply lifetimes, expirations then deletions.
To implement this, for each Challenge
the ChallMaker and Ops can set a timeout
in seconds after which the Instance
will be deleted once up & running, or an until
date after which the instance will be deleted whatever the timeout. When an Instance
is deployed, its start date is saved, and every update is stored for traceability. A participant (or a dependent service) can then renew an instance on demand for additional time, as long as it is under the until
date of the challenge. This is based on a hypothesis that a challenge should be solved after \(n\) minutes.
The timeout should be evaluated based on expert’s point of view regarding the complexity of the conceived challenge, with a consideration of the participant skill sets (an expert can be expected to solve an introduction challenge in seconds, while a beginer can take several minutes).
There is no “rule of the thumb”, but we recommend double-testing the challenge by both a domain-expert for technical difficulty and another ChallMaker unrelated to this domain.
Deleting instances when outdated then becomes a new goal of the system, thus we cannot extend the chall-manager as it would be a rupture of the Separation of Concerns Principle: it is the goal of another service, chall-manager-janitor
. This is also justified by the frequency model applied to the janitor, which is unrelated to the chall-manager
service itself.
With such approach, other players could use the resources. Nevertheless, it requires a mecanism to wipe out infrastructure resources after a given time.
Some tools exist to do so.
Tool | Environment |
---|---|
hjacobs/kube-janitor | Kubernetes |
kubernetes-sig/boskos | Kubernetes |
rancher/aws-janitor | AWS |
Despite tools exist, they are context-specifics thus are limited: each one has its own mecanism and only 1 environment is considered. As of genericity, we want a generic approach able to handle all ecosystems without the need for specific implementations. For instance, if ChallMakers decide to cover a unique, private and offline ecosystem, how could they do ?
That is why the janitor must have the same level of genericity as chall-manager itself. Despite it is not optimal for specifics providers, we except this genericity to be a better tradeoff than covering a limited set of technologies. This modular approach enable covering new providers (vendor-specifics, public or private) without involving CTFer.io in the loop.
By using the chall-manager API, the janitor
looks up at expiration dates.
Once an instance is expired, it simply deletes it.
Using a cron, the janitor could then monitor the instances frequently.
flowchart LR subgraph Chall-Manager CM[Chall-Manager] Etcd CM --> Etcd end CMJ[Chall-Manager-Janitor] CMJ -->|gRPC| CM
If two janitors triggers in parallel, the API will maintain consistency. Errors code are to expect, but no data inconsistency.
As it does not plugs into a specific provider mecanism nor requirement, it guarantees platform agnosticity. Whatever the scenario, the chall-manager-janitor
will be able to handle it.
Follows the algorithm used to determine the instance until
date based on a challenge configuration for both until
and timeout
.
Renewing an instance re-execute this to ensure consistency with the challenge configuration.
Based on the instance until
date, the janitor will determine whether to delete it or not (\(instance.until > now() \Rightarrow delete(instance)\)).
flowchart LR Start[Compute until] Start-->until{"until == nil ?"} until---|true|timeout1{"timeout == nil ?"} timeout1---|true|out1["nil"] timeout1---|false|out2["now()+timeout"] until---|false|timeout2{"timeout == nil ?"} timeout2---|true|out3{"until"} timeout2---|false|out4{"min(now()+timeout,until)"}
Listening to the community first feedbacks, we tried to lower the bar to hop in with Chall-Manager. We then created a Software Development Kit.