9 DevOps Rules for Startups To Move Fast In 2021


The idea behind infrastructure-as-code is to use a system like Terraform to write all of your infrastructure configurations in a declarative language. This allows you to use version control and enjoy all of its benefits, including code review, change tracking, etc.

Support Us




However, the biggest advantage of infrastructure-as-code for a startup is that it allows you to rebuild your entire infrastructure in just a few minutes. Established companies don’t often need to do this, but it is common early on to want to blow away and restart everything or stand up a replica of your environment for testing.

Developing your infrastructure with Terraform is more work than manually setting everything up in AWS or another cloud provider, but this effort will pay for itself quickly as changes that would normally become cumbersome and error-prone remain straightforward.

Using infrastructure-as-code also has another subtle but powerful benefit: it forces you to become proficient in DevOps. It may feel like it is slowing you down at first, but learning to build infrastructure with code will empower you to iterate rapidly down the road.

Until recently, there was no way to run a service without managing servers. AWS Lambda changed that. It is now possible to deploy a service purely with code and delegate infrastructure management to the cloud provider.

This saves a tremendous amount of hassle with setting up scaling configuration and updating systems. It also provides good security and reliability out of the box.

Despite its advantages, serverless technology is still somewhat nascent and may be challenging to set up for a complex application with Terraform.

Best practices for organizing larger serverless applications describes some of the ways to design a large serverless application. Having separate Lambdas for each API endpoint presents challenges with management and deployment, while having a single Lambda that routes to different API endpoints loses many benefits of serverless technology.

Still, you should strongly consider going serverless for new applications, as the technology will also improve over time.

In addition to API endpoints, you should give serious thought to ditching servers for landing pages and content management. Gatsby, Netlify, and Netlify CMS make it simpler than ever to build a React website with content that is editable by non-developers with live previewing.

Netlify will then deploy updates across its CDN whenever there is a commit in GitHub (which will happen automatically on the back end for CMS updates without editors needing GitHub access), and can even run A/B tests between different branches.

This setup automatically scales to large amounts of traffic without having to manage any infrastructure. Gatsby has a rich library of starter websites that allow you to get up and running in a day with very high page speed scores.

In cases where you do need to manage servers, Docker has made it easier than ever to rapidly build and update systems. Docker lets you stand up complex infrastructure quickly and rebuild containers in a matter of seconds when making changes. AWS recently announced that you can now use custom Docker images with Lambda, which allows you to go serverless too.

Automated testing is often the first thing on the chopping block when building an MVP. This is not necessarily bad since writing tests takes time and customers don’t buy tests.

However, it is very important for developers to be able to write tests with a new product. This can immediately save time by enabling developers to write tests when doing so is faster than repeated manual runs, which is only possible if the testing infrastructure is in place.

If you don’t have the testing infrastructure, you also run the risk of building software that is untestable, which can create major problems down the line. It’s hard to predict when you will need good testing, and the last thing you want is to require painful refactoring prior to reaching product/market fit.

Finally, and most importantly, having zero tests will prevent you from truly automating deployment. In Move Fast, Meyerson describes Facebook developers receiving bad “karma” for being unavailable to fix bugs after pushing changes. To really move fast, developers shouldn’t have to worry about deployment most of the time. To do this, they need the ability to write tests.

Start by implementing at least one test from each layer of the testing pyramid: unit, integration, and end-to-end. This shouldn’t be too difficult and will enable developers to make better cost/benefit decisions about testing that progressively speed up future iterations.

Most software starts with manual deployment. Setting up a continuous integration/continuous deployment (CI/CD) system takes some work, and a single deployment may only take a few minutes.

However, this time quickly adds up, and it is time that developers could spend building new features. Rather than wasting this time at first, bite the bullet and set up CI/CD right away. (This does not apply to infrastructure-as-code, where CI/CD is probably too difficult with current technology.) This will also expose potential problems with architecture or build stability, which is harder to fix the longer you wait.

With recent advancements in systems like LocalStack, it is now possible to run fairly complex architecture locally, including most of the services provided by AWS. There are a number of benefits to this approach:

  • Resource isolation: Local development guarantees that developers are not stepping on each other’s toes with shared resources. This further reduces the risk of nondeterministic application behavior and flaky tests caused by interference between application instances.
  • Easier infrastructure changes: If the infrastructure is running locally, developers have full permission to make changes and can do so more efficiently.
  • Better performance: Working locally avoids network bottlenecks, which can significantly slow down development, especially with slow or intermittent internet connections.
  • Easier onboarding: Running systems locally eliminates the need to configure cloud infrastructure for each new developer, making onboarding faster.
  • Secure by default: You should already be separating production and testing environments and following other security practices, but local development provides security by default because you are not accessing the cloud.

While local development can be helpful, the technology is still in its infancy. There are some scenarios where you may still want to use cloud services, including:

  • Insufficient parity with production: Tools like LocalStack try to faithfully represent real services in AWS, but there are some idiosyncrasies. Performance will also naturally be different. If your work needs true parity (e.g., for load testing), you may still need a cloud-based environment, though that doesn’t prevent you from doing other tasks in a local environment.
  • Connecting to third-party services outside of AWS: Most services do not have local emulation. However, you may want to create mocked versions of third-party services to enable some local development, which also facilitates end-to-end tests during the build process without external dependencies. (It really sucks when builds fail due to flaky third-party services.)
  • Storing shared credentials for third-party services: If you connect to third-party services using shared credentials, then you need to store them somewhere. A tool like AWS Secrets Manager makes this much easier than managing secrets manually, and of course, is better than checking in secrets to your repository. (You can run Secrets Manager with LocalStack, but then you’d have to store the secrets themselves locally, which still presents a problem.)
  • Fetching test data sets: If your application needs test data sets that are too big to check into the repository as fixtures, then it makes sense to store them in a shared location like an S3 bucket.
  • Resource-intensive applications: If you are developing an application that is not feasible to execute on a local computer due to lack of resources, then you may have to run it in the cloud.

One thing you should avoid is relying on cloud-based development for demos or scenarios where a different person needs to access a development version of an application. Instead, the CI/CD system should be able to create staging builds for this purpose.

MVPs have lots of errors, and errors drive away customers. It is best to discover errors before customers do (or worse, investors). Early teams often check their analytic dashboards on a regular basis to make sure everything looks good.

Constantly polling for information wastes precious time that could be used on feature iteration, and still leaves potentially large gaps between error occurrence and detection.

From the outset, new products should push out information about important error conditions. Developers should be comfortable with silence and confident that things are working without actively checking any logs or reports. This lets them work without interruption while still quickly responding when there is a problem.

Like other areas of automation, it’s easy to go too far with error monitoring and instrumentation. The following checks are relatively easy to set up and make a good starting point:

  • Alarms using a service like CloudWatch on all critical resources that could take down a service, such as CPU, memory, disk space, etc. (You can avoid most of this if you go serverless.)
  • Frontend and backend exception tracking with systems like NewRelic and TrackJS. To cut down on noise, you will want to set thresholds and filter unimportant exceptions. (See How to Manage Exceptions at Scale for how we integrated this with a notification process at Collage.com.)
  • Alarms on service error rates, particularly 5xx, which provides a backstop to exception tracking.
  • External service ping using something like NewRelic to make sure your services are reachable even if there are no internal alarms.

One way to set up these error monitoring tools is to have them push alerts to email. Don’t do this. Requiring developers to check their email (or worse, turn on email notifications) isn’t much better than having to check dashboards.

Even with a single developer, it’s worth setting up a notification system like PagerDuty. This will bubble important errors to the top while minimizing noise. As the team grows, it also prevents spamming alerts to multiple people while still providing safety with fallback notifications.

When you get your first customers, the thought of making production data available for development is alluring. Real data is richer, and using it lets you see how your features will look to customers.

The obvious downside of connecting development to production in any way (even having credentials on a development system only to read data) is that it risks catastrophic mistakes or security incidents. The bigger problem, however, is that real data is unstable and may change or go away, which could leave you in a bad place.

Investing in a good test data set (which may replicate anonymized information from production) and fully cutting off any interface between development and production will free you from constraints and risks that could slow you down at a critical time.

Security is probably the last thing on anyone’s mind when building an MVP. Of all the causes of startup death, data leaks and ransomware are probably not high on the list. But with the ease of implementing decent security practices, why take the risk? If any of your customers are businesses, you may also have to answer questions about security or undergo an audit at some point in the future. Scrambling to clean up bad practices is far more work than keeping things clean from the start. Here are some basic security practices you can follow without much effort:

  • Use two-factor authentication with all services that offer it.
  • Use a password manager and randomly generated passwords for everything.
  • Never check in credentials, API tokens, or secrets to code. Use local files or systems like AWS Secrets Manager instead.
  • Opt for systems like AWS Identity and Access Management (IAM) that provide time-limited credentials rather than using basic password authentication.
  • Encrypt everything. This includes full disk encryption for all local systems, database encryption, and SSL/TLS or SSH for all connections. Encryption is just a box to check when configuring these services with negligible performance impaction
  • Don’t allow any direct network access to systems other than what is strictly necessary. Use a bastion configuration instead and require two-factor authentication for login. If you have to allow external database connections for things like third-party ETL providers, whitelist individual IP addresses only.
  • If development is not fully local, use a separate AWS account for development. Even if you try to create separate resources and limited credentials for development in a production account, mistakes are bound to happen, so a separate account is safer.


Leave a comment

Your email address will not be published. Required fields are marked *