Nobody appreciates the words "best practice," especially when they have no idea why it is or who said it. The phrase has encroached on the territory formerly occupied by the adage "in my humble opinion." In cloud security, the former has almost supplanted the latter.
It is unfortunate because this valuable security advice falls deaf ears. Case in point: using AWS IAM roles for access control. The best practice is stated frequently without providing a compelling rationale. The security team tells you, "Use roles because they are better for security." Everything the security team says is probably better for security. For engineers, however, it is paramount to understand why they should adopt a less familiar pattern. Otherwise, the benefits of the best practice will never be realized because the developer will never adopt the practice.
IAM roles are permission sets that entities may assume to attain necessary access to perform a task. Roles are different from groups because they have less scope and duration than access granted by group membership. They are like a hat, which you wear for one session. You may define an application hat, a database hat, and a network hat. Each user or resource switches hats as they need to perform a different capability.
Why should you use IAM roles primarily for access control? Why does the practice make more robust system security? Here are the four reasons:
1. Least privilege
Roles separate the "who" from the "what." Roles group permissions logically by function rather than identity. When you write and attach policies to a role, you may select the narrowest possible scope of access related to a single functionality.
For example, take an AWS-hosted web application instance, which utilizes a cache and a database. One role needed to run the application will require read and write access to all three. Another essential role in making scheduled backups will only require read access to the database. Yet another role needed for cross-region inventory synchronization will require read access to all three.
You may try to accomplish this level of granular isolation using users and groups only. In practice, such an attempt turns access control into a nightmare. You end up with a tangled mess of extra users and groups. Their permissions stack, duplicate, and bloat as more functionality and complexity are added to the system.
In the end, maintaining the principle of least privilege becomes practically impossible. The security champion tasked with sorting the porridge will most likely give up after causing service outages by mistake.
On the flip side, roles are constructed by walking backward from resources without thinking about identity. One function should have one role. As functionality is added to the system, the number of roles also grows. But the key difference here is that each new role is lean and mean by aiming strictly at as few resources as possible. Roles are easier to examine and trim one by one compared to groups. The principle of least privilege has a solid chance to persist despite the growing complexity.
2. Stasis
Roles are more static than users and groups because each is tied to a sole corresponding functionality. Functionality can rigorously evolve during active development. But, once you get out of the prototyping stage, roles tend to firm up. An added feature will often lead to a new role being created over large-scale modification of existing roles.
The Tech industry has some of the lowest team membership persistence figures. Developers form and disperse teams frequently. Highly productive engineers are usually floating members of multiple units. And specialists are commonly borrowed from other groups during crunch times and incidents.
What are the consequences of the flux in terms of access control? Open up your console and take a look. Some users have access to stuff from two years ago. You'll see a few people who left the company but are still group members on six different teams.
Roles form a natural, actionable, and human-comprehensible checklist of permissions. They can go dormant if nobody uses them. Once they are needed again, all the correct policies are already available on re-activation. You may reset the role access occasionally, say when a new team is formed, asking team members to enumerate what roles they must have. People remain in flux; roles do not change much.
Stasis is better for security because the worst access control mishaps occur when multiple actors introduce changes. More minor total changes lead to fewer opportunities for mistakes. Access control built around roles ensures that most changes will be of the same kind: distributing hats to people. Roles lessen your cognitive load when you are making changes. Making fewer and less complicated changes is a recipe for security success.
3. Monitoring
Roles simplify monitoring by logically batching access permissions. AWS already has over four thousand different actions an entity may perform. You will never be able to associate them with each other at scale.
Role usage log presents the most digestible birds-eye-view record of system events—filter by role and by session to zoom in and instantly get an idea of what is happening. You do not have to understand the actions individually right away to foresee the outcomes.
Monitoring role usage allows you to minimize privileges faster than anything else. If you send an email to a team member asking, "Hey, do you still need to use AWS organizations:ListOrganizationalUnitsForParent and autoscaling:DisableMetricsCollection?" do you expect a rational and timely response? "Hey, you have not used the role QueueImageCompressionForEastCoastCDN since last quarter. Do you still need it?" is more effective.
Role-bound alarms and notifications make more sense—issue alarms when roles are modified so that multiple people can review them. Usage of influential roles is more accessible to follow than triggers based directly on actions.
Effective monitoring is all about specificity and reading speed which increase visibility. Counter to intuition, observing more events does not automatically increase visibility. A firehose of events can just as easily confuse the observer and drown significant events in the sea of traces. Roles increase security with more straightforward monitoring by associating disparate events with high-level functions being performed.
4. Automation
Roles are the only AWS access control mechanism that integrates seamlessly with automation. Lambda functions require a role. EC2 instances assume roles via instance profiles without having to manage credentials directly
Not everything should be automated. But when you see a good opportunity for automation, you should take it. Have you already set up access control using roles? You eliminated a step and reduced security risk. Pass the hat to the instance --- no new policies to write and no old policies to check for trailing access right. Revoke access from humans by removing role assumption rights from groups, like you usually do.
What if you do not yet have role-based access control? Are you using account access keys for automation instead? How are those keys passed to the instances? How often are you rotating them? Who else has access to the account from which the access keys are taken? You get the point: you need roles.
Conclusion
When it comes to AWS security, role-based access control is great because it is better at keeping least privilege, system stasis, and ease of monitoring at scale, and it integrates seamlessly with automation.
Smaller logical units are more granular. Verbs are easier to understand and recall than nouns. When changes are introduced to roles, they tend to be less expansive and maintain a higher level of status.
In turn, monitoring is super-charged with increasing visibility because stasis fosters familiarity. To make the most of this approach, create a unique role for each functionality and use groups to distribute roles. Naming roles well is paramount. Follow similar principles and patterns used for naming functions, like VerbObjectCondition or VerbObjectPurpose. Safe cloud crafting!