Engineering time is a scarce resource. We often have to balance many tasks and often conflicting priorities. However, there are some activities for which allocating more of that time can be beneficial. In this article, we’ll look at ten of them.
Have you ever deleted something prematurely only to figure out that there is no backup? A good rule of thumb is to check three times before deleting anything. This may involve cross-checking if we are in the right environment, region, database schema, or S3 bucket.
Additionally, there are many ways of mitigating the impact of unintentional deletions:
- you can enable versioning on cloud storage buckets (e.g., AWS S3)
- you can configure automated backups
- you can restrict access so that only a few people can delete things
- many databases have additional data recovery features such as time travel, allowing you to go “back in time” and run queries as of the time before data has been deleted (e.g., Snowflake, Databricks, BigQuery).
All major cloud providers offer an object storage service. Most of them (AWS S3 or GCP GCS) store data in buckets. Microsoft chose the naming “containers” rather than “buckets” for their Azure Blob Storage.
Why is this naming confusing?
It is because containers are typically associated with running instances of Docker images, not with storing BLOB objects. This example shows that even the largest technology companies in the world make confusing naming decisions sometimes.
One of my colleagues often repeats this quote which nicely summarizes it:
“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton
Great technology products are rarely created in a vacuum — they emerge when several smart people support each other in solving challenging problems. That’s why feedback and code reviews are so critical. They serve as a basis for constructive discussions that lead to better engineering processes and better code.
- Don’t: write “LGTM!” and approve the pull request a minute after receiving it.
- Do: take as much time as needed to understand the code and the intentions behind it and question whether everything works as expected.
Most engineers creating pull requests genuinely want to hear your thorough feedback and learn from your experience. They rely on you to identify mistakes and issues they might not have thought about.
You probably encountered it many times in your career. Somebody assigned you a ticket saying: “do XYZ” without specifying the reason behind it. It’s assumed upfront that XYZ is the right solution to the problem, and you should simply do it. But when you start diving deeper into the actual issue, you notice that XYZ may not be the optimal approach.
- Don’t: create a ticket that prescribes a specific approach such as “Build a script that collects data about X and shares this data with the client ABC as Excel file email attachment.”
- Do: create a ticket that defines the problem: “Client ABC needs to receive data about X on a daily basis. Talk to the client, and figure out the best interface to share this data in a reliable and secure way.”
It’s always helpful that a ticket or user story defines the problem and the stakeholders involved. It may suggest potential solutions but not exactly prescribing what must be done unless you’re really sure about that. Once the problem is defined, engineers are smart enough to figure out the best ways to tackle it.
Architecture decisions typically have far-reaching consequences. Once things are implemented, it’s expensive to “undo” them (e.g. time-consuming migration projects).
Still, often we don’t take time to evaluate enough options and fail to ask for feedback. The most popular tool may not necessarily be the best one for the problem at hand.
- Don’t: rush implementing the first design drafts or the most popular tool.
- Do: take time to evaluate which architectures or designs are good candidates for the problem at hand. Run proof of concepts. Ask other engineers, stakeholders, and external consultants for feedback. Test various options in practice and then decide what works best.
Sometimes reaching an agreement on common standards is half the battle. Failing to communicate with other teams may lead to frictions and conflicts.
For instance, many data teams invested heavily in cloud data warehouses, data ingestion platforms, and SQL-based transformation tools in recent years. Then, they started to advocate that everyone should, from now on, use SQL to solve all their analytical problems.
Following this advice, one may think that we no longer need distributed clusters such as Dask or Spark. But trying to standardize (solely) on SQL, we forget other teams. What about data scientists and quantitative researchers?
SQL is not enough to solve their problems. The same is true when you need to serve data for consumption by APIs, process automation, and many other interesting use cases that leverage data for more than reporting. Python would be a much better choice for those use cases.
Similarly, Paul Singman suggested in this article that agreeing on common data definitions and interfaces between software engineers and data teams can sometimes eliminate the need for ETL jobs. Note that this is feasible only in a few circumstances when building tools based on a pub-sub architecture.
Giving praise or paying compliments may feel awkward sometimes, but we all crave validation to some extent. It’s fascinating to see the positive impact of a simple “Great job!”. Watch out for false praise, though; people can tell when compliments are insincere.
It’s challenging to find and manage engineering talent these days. But hiring prematurely and then firing people can harm the company culture and team morale. Many adopt the “hell yes or no” approach. Whichever strategy you choose, it’s advisable to take time and be intentional about the team you want to build.
Some issues in engineering stem from not reading the logs properly.
A true story: in my first consulting project, we were working on a Hadoop cluster, and I had difficulties figuring out why my Spark job has failed. I asked a colleague, and he pointed me to a Java stack trace with the error message, which I seemed to have overlooked in the log files.
I have to admit that he was right — the answer was in the logs; I didn’t take enough time to read it all thoroughly.
If you want to avoid similar embarrassing situations and you happen to use serverless, have a look at Dashbird. The platform allows you to filter through the logs of all your serverless AWS resources.
You don’t need to set up any custom log handlers in your code. Dashbird automatically pulls the logs directly from the CloudWatch APIs so that you don’t need to set up anything.
You can then search through all your logs in real-time from the UI, including X-Ray traces.
Did you know that most IT projects fail due to communication issues? We have more communication tools than we ever had in history, yet it’s still challenging to find a balance between under-and overcommunicating.
- Don’t: send fifty short messages every twenty minutes.
- Do: think about what needs to be communicated, write it down, and figure out what’s the best channel to communicate this: an email (if 1–2 days response time is fine), a call (if it requires a discussion or screen share), or an instant message (if it’s critical, or something that blocks you).