I really like conferences, they give me ideas about what’s next, what to learn, whom to know, new whitepapers to read and products to check out. Conferences are as much about technical talks as much about the people you can meet. I was very lucky to go to this one thanks to Sighup and with a friend, which made it even better.
dotConferences are a series of conferences started in 2012 with the goal of creating a TED-like technology event targeting the developers community. This was my second one - with the first being the dotGo last november - and I’m really liking the format and the quality so far.
The best feature, in my opinion, is that they’re single track: there is no choosing what to see. This alleviates pressure to choose correctly on the attendees and forces organisers to find the best possible speakers.
This is a list of ideas and topics from dotScale 2018:
David Gageot, from Google, started the morning with a good introduction to Istio, what problems it solves, using Skaffold (which he’s contributing to) to deploy a micro-service application to Kubernetes and observing Istio providing automatic tracing, metrics and general insights on the application behavior. To give this presentation he wrote his own tool which is super cool and that we all hope he’ll release very soon.
Preeti Vaidya, from Viacom, illustrated how it’s very important not to choose a database based on how comfortable developers are with a specific tool but rather on the structure of your data. She gave a 4 steps guide to choose an actual type of database to use: Operations, Application, Data Mode, Data Store. Long story short: it’s important to understand what queries your application is going to perform on the data before choosing a relational database.
Damien Tournoud, founder of Platform.sh, gave an awesome talk on how monitoring is done inside his company. They built their own monitoring system from scratch, with a basic and out-of-the-box idea: monitoring is about the business logic of the monitored system. They built a model of their infrastructure, systems, and from there they generate alerts and monitoring metrics. This approach is interesting compared to those “just monitor everything”, because the most complicated thing in monitoring is creating meaningful and actionable alerts.
After this there were few lightning talks:
- Enrico Signoretti, from OpenIO, described a new device in the storage industry called nano-nodes.
- Dan, from Datadog, talked about how “FaaS: they’re not magic” but rather just an event-driven architecture as a Service.
- Mike Freedman, from TimescaleDB, illustrated how most of the data is actually time-series data and how viewing data this way can help your business.
Willy Tarreau, creator of HAProxy, gave us a few pointers on how to introspect applications in production with HAProxy. HAProxy, as most load balancers, has a different point of view on your infrastructure than most of the other applications. This is why it’s important to have a clear understanding of the metrics behind it. Few things to keep in mind are: queue length, halog, logging only 5% of requests sometimes helps when the request count is very high and that similar systems should have similar metrics and to check differences is more important that the absolute value.
Yaroslav Tkachenko, from Activision, explained how to manage data pipelines for Call of Duties games using Kafka, a custom Stream Processor in front of it to normalize the data that comes in and establish a schema-registry for formatting messages. This last bit, I think, is the most important part: queues offer the best way to decouple producers from consumers in a large infrastructure, but decoupling sometimes can get out of hands when the consumers don’t know what kind of data comes in. A schema registry is useful because defines a common definition of the structure of the message produced and consumed. Standardisation of processes and API definition is always key to scalability.
Next was Marc Shapiro turn, explaining how sometimes neither strong consistency nor eventual consistency are enough to satisfy our application needs. As always in computer science there must be a third option: just-right-consistency. The idea is to have a system that’s as available as much as possible and as consistent as necessary; to achieve this there are few concepts to explain: CRDTs and causal consistency.
Finally was Paul Dix turn, cofounder and CTO of InfluxData, giving us some insight of the Influx Data Platform 2.0. The talk was called “Everything at scale is hard” and described his and the company journey through the years of changing infrastructure paradigm from VM to containerised workloads, from monolith to microservices. The Data Platform he envisioned is based on 3 different storage tiers: memory, local SSDs and object storage glued together by a unified query language called FLUXLANG. This new “language for data” is a DSL meant to be run on distributed systems and it’s a complete rewrite, moving past SLQish tentatives. Two quotes that stroke me the most are: “data has gravity” and the final statement “everything at scale is interesting”.
The last talk was given by Jeremy Edberg, who worked at Reddit and Netflix, described how infrastructure data should be treated as any other type of data in a company. How can we derive velocity from movements, exactly like in physics 101? He designed a tool that is basically a SQL interface over major cloud provider infrastructure: imagine if your data scientist could query and analyse the changes or versions on your cloud provider infrastructure in the same way customer data queries are run?
Overall the conference was amazing, it’s absolutely important for us to stay up to date and actively participate and contribute to the community. Conferences are an especially important chance to get out of daily work-related problems and focus on higher level concepts and ideas.
You can follow me on twitter and ping me there if you have questions or want to quickly discuss any of the topics mentioned.
A huge thank to Sighup. You should follow them as well, especially if you are interested on distributed systems and Kubernetes. We are hiring! Shoot us an email if you love automating things, challenge yourself with hard problems and are eager to learn constantly.