Solutions Architect Exam Guide
AWS Best Practices
Operational Excellence
Direct Connect - direct line from your data center to AWS
KMS - allows you to import your own keys, disable/re-enable keys, and define key management roles
AWS Shield - Protects against DDOS
AWS Macie - Uses ML to protect sensitive data
AWS WAF - Protects against XSS attacks and can block IP addresses
Consolidated billing allows bills coming from multiple AWS accounts to be rolled up to a single bill. Charges are still traceable to their original accounts, there is no charge for consolidation, and it could potentially reduce the overall bill
AWS Trusted Advisor - Will advice on security such as, MFA being configured in the default acount and calling out security groups and ports that have unrestricted access

1.3 - Cloudfront

Concepts

Edge Location - Location where the content will be cached
Origin - Where the files to distribute come from (S3, EC2, ELB, Route53)
Distribution - Consists of a set of edge locations
RTMP - Used for Media Streaming

Facts

Edge locations are read/write
Objects are cached for their TTL
Clearing cached objects incurs a cost

Useful Links

CloudFront Key Features

1.4 - Cognito

Facts

Can provide identity federation with Google, Facebook, or Amazon
Can be the identity broker for your application
User Pools handle things like registration, authentication, and account recovery
Identity pools authorize access to AWS resources

1.5 - Databases

RDS

Read Replicas are for performance
Multi-AZ is for DR
SQL, MySQL, PostgreSQL, Oracle, Aurora, MariaDB
Runs on VM’s but you do not have OS level access to them
Patched by Amazon
Not serverless (except for Aurora Serverless)
Encryption at rest is supported
SQL Server and Oracle can have a maximum of 2 databases per instance
Aurora
- 2 copies of the data is stored in each AZ at a minimum of 3 AZ’s
- Snapshots can be shared across accounts
- Automated backups turned on by default

DynamoDB

No SQL
Uses SSDs
Spread accross 3 geographically distinct data centers
Eventual consistent reads by default but strongly consistent reads can be enabled (for a cost).
Name and value combined cannot exceed 400kb

Elasticache

Memcached (caching)
Redis (caching + pub/sub)

Redshift

For Business Inelligence or Data Warehousing
Only available in 1 AZ
Can restore snapshots to new AZs if there is an outage
Can retain backups for a maximum of 35 days

Useful Links

1.6 - EC2

Instance Types

Field Programmable Gate Array (F1) - Genomics research, financial analytics, video processing, big data
High Speed Storage (I3) - NoSql DB’s, Data warehousing
Graphics Intensive (G3) - Video Encoding, 3D Application Streaming
High Disk Throughput (H1) - Map Reduce based workloads, distributed file systems
Low cost, General Purpose (T3) - Web Servers, small DB’s
Dense Storage (D2) - File servers, Data warehousing, hadoop
Memory Optimized (R5) - Memory Intensive Apps/DB’s
General Purpose (M5) - Application Servers
Compute Optimized (C5) - CPU Intensive Apps/DB’s
Arm-based (A1) - Scale-out workloads

Pricing

On Demand - Fixed rate, no commitment
Reserved - Capacity reservation and discount with upfront commitment of 1 or 3 years
Spot - Bid for a price you want to pay. If terminated by Amazon you will not be charged for a partial hour of usage.
Dedicated - Physical EC2 server dedicated for your use

Placement Groups

Clustered - Low latency/High Throughput, single az
Spread - Individual Critical, can be multi az
Partitioned - Multiple EC2 Iinstnaces, can be multi az
Name must be unique for your account
Not all types can be in placement groups
Cant move an existing instance into a placement group

Facts

Termination protection is off by default
Instance Store Volumes are ephemeral
Retrieve metadata for an instance with curl http://169.254.169.254/latest/meta-data/
Retrieve user data for an instance with curl http://169.254.169.254/latest/user-data/
When a dedicated host is stopped you can switch it between “dedicated” (single-tenant hardware) and “host” (isolated server), but not back to “default " (shared hardware)

Storage

EBS

Elastic Block Store (for most EC2 workloads).

Types

General Purpose SSD (gp2) - Most work loads
Provisioned IOPS SSD (io2) - Databases
Throughput Optimized HDD (s1) - Big Data/Data Warehouses
Cold HDD (sc1) - File Servers
EBS Magnetic (Standard) - Infrequently accessed data

Facts

Root EBS volumes can be encrypted (so can other volumes)
EBS Snapshots exist on S3
EBS Snapshots are incremental
Snapshots should not be taken of a root volume when an instance is running
EBS volume sizes can be changed on the fly
EBS Volumes will always be in the same AZ as the instance they are attached to
By default the root EBS volume is destroyed if an instance is terminated

EFS

Elastic File Store (super scalable NFS).

Supports the NFSv4 protocol
Does not require pre-provisioing
Can scale to petabytes
Can support thousands of concurrent connections
Provides read after write consistency

Useful Links

1.7 - ELB

Elastic Load Balancer

Concepts

Application Load Balancer - Layer 7, can route based off application needs.
Network Load Balancer - Layer 4, can route based off network information
Classic Load Balancer -

Facts

504 meants the gateway has timed out. This means there is an issue with your application
The end user’s IPv4 address is available in the X-Forwarder-For header
Only given DNS never IP
With cross zone load balancing you are able to equally distribute load across instances in multiple AZ’s, without it you can distribue load evenly between multiple AZ’s but not evenly across instances
Sticky sessions can be configured so one user is always routed to the same instance
Path patterns allow you to route to instances based off the path of the request

Useful Links

Elastic Load Balancing

1.8 - IAM

Identity and Access Management

Concepts

Users
Groups - Can be used to organize users and their permissions
Roles - Used for AWS resources to authenticate with each other. Access/Secret Keys should never be used by AWS resources.
Policies - JSON Document describing what resources can be accessed and in what capacity

Facts

Global not regional
Users have no permissions when created
Root account is the account used when creating the organization, it should be secured and then not used

1.9 - Kinesis

For Streams

Types

Kinesis Streams - Producers send data to shards, where it is available to consumers for between 24hours and 7 days. Typically consumed by an EC2 instance and forwarded to a data store (such as dynamo, s3, emr, or redshift) where it can be processed further
Kinesis Firehose - Data must be processed right away, typically sent to elasticsearch, s3, or redshift (via s3)
Kinesis Analytics - Can be used in cunjunction with streams or firehose, automatically processes data right away

1.10 - Lambda

Serverless Compute

Facts

Priced on the amount of memory assigned combined with the duration of execution
Scales out automatically
Each event triggers a unique instance of a lambda function
One function can trigger one or more other functions
X-ray can be used to debug serverless applications
Lambda can perform operations globally
Lambda Triggers

1.11 - Route 53

DNS Services

Routing Policies

Simple - No health checks
Weighted - Split requests by %, supports health checks
Latency - Sends traffic to region with lowest latency
Failover - Active/Passive Routing

Common DNS Types

Start of authority record (SOA) - Specifies authoritative information about a DNS zone, including the primary name server, the email of the domain administrator, the domain serial number, and several timers relating to refreshing the zone.
Nameserver (NS) - Delegates a DNS zone to use the given authoritative name servers
Address (A) - IP address to direct traffic to
Canonical Name (CNAMES) - Alias of one name to another
Mail Exchange (MX) - Maps a domain name to a list of message transfer agents
PTR - Pointer to a canonical name

Facts

You can register domains on AWS
Sometimes it can take days to register a new domain name
You can integrate SNS to be notified of health check failures
Health checks can be applied to individual record sets
Given the choice always choose an alias record over a CNAME

1.12 - S3

Simple Storage Service (object storage)

Tiers

S3 Standard - 4 9’s Availability, 11 9’s durability, designed to sustain the concurrent loss of 2 data centers
S3 Standard IA - For data that is infrequently accessed but requires rapid access when it is accessed
S3 One Zone IA - For data that does not require multi-az durability is infrequently accessed but requires rapid access when it is accessed
S3 Glacier - Secure, durable, cheap. Can take minutes to hours to retrieve (configurbale)
S3 Glacier Deep Archive - Can take up to 12 hours to retrieve
S3 Intelligent-Tiered - Automatically adjusts tier of objects to optimize cost while maintaining performance

Pricing

Storage
Requests
Storage Management
Data Transfer
Transfer Acceleration
Cross Region Replication

Object Attributes

Key
Value (object)
Version ID
Metadata

Concepts

Versioning - Stores a version of every change (including deletes). Charged storage for each version.
Life Cycle Management - Can be used to manage tier of storage based off rules you define or automatically. Can be applied to current and previous versions.
Cross Region Replication - Can automatically replicate objects to a different bucket in another region. Does not apply to objects created before Cross Region Replication is configured. Versioning must be enabled (in both buckets), and delete markers are not replicated.
Accelertation - When using acceleration you always upload to edge locations rather than to directly to the bucket, then the file is transfer along amazon’s backbone to the data center(s) your bucket resides in. Acceleration does not always result in faster uploads.

Facts

Files can be 0B-5TB
Bucket names must be globally unique
Supports MFA Delete
Read after write consistency for PUTs of new objects
Eventual Consistency for overwrite PUTs and DELETEs
Once enabled, versioning can never be disabled, only suspended
Snowball is a physical device that can be used to import or export data from S3

File Gateways

Physical device with direct line to Amazon.

File Gateway - For flat files stored directly on S3
Volume Gateway
- Stored Volumes - Entire dataset is stored on site and asynchronously backed up to S3
- Cached Volumes - Entire dataset is stored on S3 and most frequently accessed data is cached on site

Useful Links

1.13 - SNS

Simple Notification Service

Facts

Push based
Simple API’s
Flexible message delivery over multiple transport protocols
Cheap
Can be used to de-couple your infrastructure
Standard SQS - Order is not guaranteed and messages can be delivered multiple times
FIFO SQS - Order is strictly maintained and messages are delivered only once

1.14 - SQS

Simple Queue Service

Facts

Pull based
Messages are 256KB
Messages can be in the queue from 1 minute to 14 days, default retention is 4 days
Visibility Time Out - the amount of time the message will be invisible after it is picked up. If the job is finished before the visibility timeout expires, it is removed from the queue, otherwise it is made visible again. Default is 12 hours
Guarantees messages will be processed atleast once

Useful Links

How SQS Works

1.15 - SWF

Simple Workflow Service

Facts

Workflow executions can last up to a year
Task Oriented API (vs message oriented)
Ensures a task is assigned only once and is never duplicated
Actors
- Workflow Starts - Anything that can initiate a workflow
- Deciders - Control the flow of tasks in the workflow
- Activity Workers - Perform the tasks

1.16 - VPC

Concepts

Internet Gateway (IGW) - An internet gateway is a horizontally scaled, redundant, and highly available VPC component that allows communication between instances in your VPC and the internet. It therefore imposes no availability risks or bandwidth constraints on your network traffic. Only one internet gateway can exist per VPC
Virtual Private Gateways - Allows you to peer your local network with a VPC
Egress-Only Internet Gateway - Prevents IPv6 based internet resources from connecting into a VPC while allowing IPv6 traffic to the internet
Route Tables - A route table contains a set of rules, called routes, that are used to determine where network traffic from your subnet or gateway is directed.
Network ACL
- Default ACL comes with each VPC and allows all inboud and outbound traffic
- Custom ACL’s deny all inboud and outbound traffic by default
- Each subnet must be associated with an ACL, if one is not explicitly attatched the default ACL is applied
- ACL’s allow you to block IP Addresses
- A single ACL can be attatched to multiple subnets
- Each rule is numbered, rules are evaluated in order
- Inbound and Outbound rules are separate
Subnets
- A single subnet cannot span multiple AZ’s
- A public subnet always has atleast one route in its table that uses an IGW
- AWS reserves the first 4 and the last IP for each subnet’s CIDR block
Security Groups
- All inbound traffic is blocked by default
- All outbound traffic is allowed by default
- Changes take effect immediatley
- Unique to each VPC
- multiple groups can be assigned to a single instance
- multiple instances can be assigned to a single group
- Can specify allow rules but not deny rules
NAT Instances - provide internet access
- Must be in public subnet
- Disable source/destination check on the instance
- Must be route to private subnet for instances there to be able to use it
- If there is a bottleneck consider making the instance larger
- Can be HA if it is in an Autoscaling Group and failover is scripted
- Uses security groups
- Cannot be used as a bastion
NAT Gateways - provide internet access
- Redundant within a single AZ
- 5Gbps to 45Gbps
- Does not use security groups
- No need to patch or disable source/destination checks
- Automatically gets public IP
- If using multiple AZ’s put a NAT Gateway in each AZ with appropriate routing to ensure availability
Flow Logs
- Log traffic within a VPC
- Cannot enable flow logs for peered VPC’s unless those VPC’s are in your acconut
- Flow logs cannot be tagged
- Internal DNS Traffic is not logged
- Traffic generated for windows license validation is not logged
- Traffic to/from 169.254.169.254 is not logged
- DHCP Traffic is not logged
- Can be generated at the network interface, subnet, and VPC levels
VPC Endpoints - allows traffic to AWS services to stay within AWS. Endpoints are virtual, horizontally scaled, and highly available
- Interface Endpoint - API Gateway, Cloudformation, Cloudwatch, CodeBuild, Config, EC2 API, ELB API, Kenisis, KMS, SageMaker, Secrets Manager, STS, Service Catalog, SNS, SQS, Systems Manager, Endpoints in another AWS account
- Gateway Endpoints - DynamoDB, S3

Facts

No Transitive Peering
Security Groups are stateful, Network ACL’s are stateless
When creating a custom VPC a Route Table, ACL, and Security Group are all automatically created
A VPN connection consits of a customer gateway and a virtual private gateway
By design Amazon DNS ignores requests coming from outside a VPC

Useful Links

2 - C#

2.1 - Making Rest Calls in C#

These have been tested with .Net 4.5

Basic Auth Example

request.Headers.Authorization = new AuthenticationHeaderValue( "Basic", Convert.ToBase64String( ASCIIEncoding.ASCII.GetBytes( 
    string.Format( "{0}:{1}", user, password ) ) ) );

Generic Request Example

public async Task<T> MakeRequest<T>( Uri uri ) 
{ 
     var client = new HttpClient { Timeout = new TimeSpan( 0, 0, Settings.Default.timeout ) }; 
     try 
     { 
          HttpResponseMessage response = await client.GetAsync( uri ); 
          if ( response.IsSuccessStatusCode ) 
          { 
               var content = await response.Content.ReadAsStringAsync( ); 
               try 
               { 
                    return JsonConvert.DeserializeObject<T>( content ); 
               } 
               catch ( Exception e ) 
               { 
                    throw new Exception( String.Format( "Error deserializing {0}, additional message: {1}", content, e.Message ) ); 
               } 
          } 
          else 
          { 
               throw new Exception( String.Format( "Error getting response from {0}, Status code: {1}", uri, response.StatusCode ) ); 
          } 
     } 
     catch ( Exception e ) 
     { 
          log.Error(string.Format("Request timed out to: {0}", uri)); 
          throw e; 
     } 
}

3 - Crypto

Crypto Notes/Expirments

Source and code support notes can be found in beverts312/crypto-learning. The UI for expirments is at https://crypto.baileyeverts.com and the apis are at https://crypto.baileyeverts.net.

Resources

Coinbase Earn/Learn (free crypto) - Crypto reward stuff is super easy/lite, some blog posts get into good topics
Rabbithole (free crypto) - More advanced, really cool
Thirdweb - Free tooling (take a cut of what you make), great tutorials even if you dont want to use their tools
Messari - Great data/research, strongly reccomend this paper

3.1 -

Crypto Learning

For learning crypto things

3.2 - Bitcoin

The OG Cryptocurrency

Random Facts

If you bought $100 of gold and $100 btc in December of 2011, the gold would be worth $102, the bitcoin would be worth $1.7 Million (December 2021, Messari)
bitcoint does 300k settlements/day vs 800k/day for Fedwire, but bitcoin settlements are often batched so bitcoin network is already probably clearing more transactions (December 2021, Messari)
In the US we falir the equivielnt of 150TWh/day, that is over 8x as much energy as the BTC network uses in a year (December 2021, Messari)

Useful Resources

3.3 - General Crypto Notes

3.3.1 - Domains

ENS vs UDS

Ethereum Name Service (ENS) and Unstoppable Domain Service (UNS) are 2 different services which surface information from domain records which are stored on the blockchain. Primarily both services make it easier to route payments to addresses but each service has additional capabilities.

Checkout my tool for resolving ENS/UNS domains here.

	ENS	UNS
Blockchain	Ethereum	Polygon
Site	https://app.ens.domains/	https://unstoppabledomains.com/
TLD’s	`.eth`	`.crypto`, `.nft`, `.x`, `.coin`, `.wallet`, `.bitcoin`, `.dao`, `.888`, `.blockchain`
Registeration Term	Annual (can bundle many yearrs together)	Forever
Payment	Ethereum	Fiat, crypto
Gas	Used for registration/Updates	Used for registration/updates but it is covered by Unstoppable
Other Capabilities		When used with compatible browser/extension can be used as a DNS Services for IPFS sites, Provides login service
My domains (OpenSea Links)	everts.eth	everts.crypto

3.3.2 - Myths

Crypto currency is just used by criminals

.34% of cryptocurrency transactions are illicit (Messari, December 2021), that is a smaller percentage than using traditional finance.

3.3.3 - Terms

Term	Definition
TradFi	Traditional Finance
DAO	Decentralized Autonomous Organization

3.4 - Hands On

Expirment Notes

3.4.1 - Login

Every ethereum wallet consists of a public key and private key, the private key is only known (or should only be known) by the owner of the wallet. Because of this we can validate wallet ownership by asking the owner to sign a generated message with their key and then validating that the signed message matches what we would expect using the public key.

Tools like Metamask allow us to interact with wallets using javascript. Metamask can manage the private keys itself or the private keys can be managed on a secure hardware wallet such as a ledger and anytime metamask needs to perform an operation that leverages the private key it will offload that part of the flow to the hardware wallet.

For the UI layer I chose to use ethers.js to make it easier to interact with the ethereum blockchain.

This is roughly the ui login code:

Initialize ethers provider/signer


const provider = new ethers.providers.Web3Provider(window.ethereum);    // Initialize ethers
await provider.send("eth_requestAccounts", []);                         // Prompt to connect wallet (if not already connected)
const signer = provider.getSigner();                                    // Initialize signer
const address = await signer.getAddress();                              // Get the connected address (multiple addresses can be managed by metamask)
const challange = await getChallenge(address);                          // Retrieve challenge from api (simple fetch api call)
const signedChallenge = await signer.signMessage(challange);            // Ask to sign message, will prompt user in ui and on hardware wallet if connected
const jwt = await getJwt(address, signedChallenge);                     // Retrieve jwt from api (simple fetch api call)

For the api I chose to use web3.py to make it easier to interact with the etherum block chain. To create the challenge I simlply generate a uuid, the uuid is stored in the db (associated with the user who requested the challenge) and then returned to the user.

This is roughly the code to validate the signature:

Retrieve and validate challenge


stored_challenge = UserChallenge.get_challenge(addr).get("challenge")   # get challenge from db
w3 = Web3()                                                             # initial web3 lib
account = w3.eth.account.recover_message(                               # use stored challenge with signed challenge to retrieve address
    encode_defunct(text=stored_challenge),
    signature=challenge_to_validate,
)
if acct == addr:                                                        # ensure signing address matches challenge address
    return generate_jwt(addr)
else:
    return 401

3.5 - Helium

The People’s Network

Helium is a decentralized wireless network

Components

WHIP - Narrowband wireless protocol, cheap, long range, low power, open source,
Hotspots - provide wireless coverage, connect devices and routers
Devices - connect to hotspots using WHIP
Routers - internet deployed apps, recieve traffic from devices (via hotspots) and route them where they need to go. HA routers provided by the network but you can run your own

Concepts

Proof of coverage - For proving hotspots are proving wireless coverage to an area
Proof of serialization - To achieve time consensus, used to validate proof of coverage is real
Proof of location - For proving hotspots are where they say they are, uses TDoA (whitepaper pg 12)
Helium Consensus Protocol - Based off HoneyBadgerBFT, miners submit proofs which translate to scores for miners, best miners get elected to consensus group

Other Concepts

Full Node vs Light Client - full nodes have full history (routers), light clients only have a moving window of history (hotspots)

Useful Resources

4 - Docker

4.1 - Docker Registry

Docker provides a free registry which is distributed as a docker image. While it is easy to get up and running with a registry maintaining one long term isn’t what most people want to spend there time doing. With that in mind I would consider using a SaaS offering such as Docker Hub, Google Container Registery, or EC2 Container Registry.

Set up a docker registry with Basic Auth

Provision an Ubuntu machine, Install docker, docker-compose, apache2-utils, openssl
Create a selfsigned cert and key and place it in /auth/certs (alternatively obtain a real cert)
One method of creating a selfsigned cert is this:

echo 01 | sudo tee ca.srl > /dev/null
openssl req -newkey rsa:4096 -nodes -sha256 -keyout domain.key -x509 -days 365 -out domain.crt

Create a htpasswd file in /apps/auth with your user(s) and password(s) htpasswd -cB passwordfile user
Create a file docker-compose.yml similar to this:

apps:  
  restart: always  
  image: registry:2.2  
  ports:  
    - 5000:5000  
  environment:  
    REGISTRY_HTTP_TLS_CERTIFICATE: /certs/domain.crt  
    REGISTRY_HTTP_TLS_KEY: /certs/domain.key  
    REGISTRY_AUTH: htpasswd  
    REGISTRY_AUTH_HTPASSWD_PATH: /auth/passwordfile  
    REGISTRY_AUTH_HTPASSWD_REALM: Registry Realm
  volumes:  
    - /apps/certs:/certs  
    - /apps/auth:/auth

From the directory of your docker-compose.yml run docker-compose up –d

Using the registry

-You will need to have the insecure-registry flag added to your docker daemon options on all hosts you wish to interact with the registry with (required for any registry without a trusted cert)
-You will need to login to the docker registry before you interact with it: docker login :
-Login will prompt you for an email, you can enter anything

5 - Docker

5.1 - Centos 7

Useful Commands for Centos 7

Disable GUI on Startup - systemctl set-default multi-user.target
Enable GUI on Startup - systemctl set-default graphical.target

5.2 - Firewall

General

On most distributions these runs need to be run as root. Remember that there should be multiple layers of protection between your app and the internet and you may need to adjust the configuration of other layers of defense to allow traffic through.

Centos/RHEL 7

Install firewalld - yum install firewalld
Enable firewalld - systemctl enable firewalld
Start firewalld - systemctl start firewalld
Check current rules - firewall-cmd --list-all
Open port - firewall-cmd --zone=public --add-port=[number]/[protocol] --permanent && firewall-cmd --reload

Ubuntu

Enable/Disable Firewall - ufw [enable/disable]
Open/Close port - ufw [allow/deny] [port]/[protocol]
Allow/Deny from a specific IP - ufw [allow/deny] [ip]
Allow/Deny a specific service - ufw [allow/deny] [service-name]

5.3 - PI Setup Tips

Password

Default is pi/raspberry Use passwd to change

Keyboard

Default is brittish, we gotta fix that shit

sudo dpkg-reconfigure keyboard-configuration hit enter on the first screen, then other, and English (US)

Wifi

Configure - /etc/wpa_supplicant/wpa_supplicant.conf

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
ssid="NETWORK NAME"
psk="NETWORK PASSWORD"
}

Also Configure - /etc/network/interfaces

source-directory /etc/network/interfaces.d

auto wlan0
allow-hotplug wlan0
iface wlan0 inet manual
wpa-roam /etc/wpa_supplicant/wpa_supplicant.conf

Raspotify

Allows you to use your raspberry pi as a spotify connect client

Install curl -sL https://dtcooper.github.io/raspotify/install.sh | sh
Configure /etc/default/raspotify

5.4 - scp

scp makes it easy to transfer files to/from a remote host.

Useful Commands

Copy from remote to local - scp [user]@[remote]:[remote file path] [local file path]
Copy from local to remote - scp [local file path] [user]@[remote]:[remote file path]

5.5 - tcpdump

tcpdump is simple but powerful tool for analyzing network traffic.

Start capturing tcpdump -i any -w /tmp/http.log &
Do your thing
Stop Capturing killall tcpdump
Check it out tcpdump -A -r /tmp/http.log | less

Filtering

To make your data easier to view you can scope the traffic tcpdump is capturing to only get what you are interested in. Filter traffic going to a specific port tcpdump dst port $PORT. Filter traffic going to a specific ip tcpdump dst $IP. Filter traffic going to a specific interface tcpdump -i $INTERFACE.

5.6 - Ubuntu

Ubuntu 18.04 - things to install

Applications

Name	What
Albert	Configurable Launcher (Alfred replacement)
Peek	Record screen to make gifs
Gnome Tweaks	Better control of gnome extensions/user prefs
Boostnote	Notetaking
Chromium	Web Browser
Spotify	If you have to ask…
Mailspring	Mail

Gnome Extenions

Name	What
AlternateTab	Windows like `alt + tab`
Battery Percentage	Display Batter Percentage
Caffeine	Amphetamine replacment
Clipboard Indicator	Clipboard history access
Docker Integration	Docker toolbar
Drop Down Terminal	Hot key accessable terminal
Pixel Savers	More efficient window control layout

5.7 - Wireshark

tshark -f “host 1.1.1.1 and tcp port 10101” -i any -w nhlDEV.01032019.pcap -F libpcap

6 - Machine Learning

These pages come from beverts312/machine-learning

GPU’s in the cloud

Service	Pricing	Template Support	Notes
Replicate	Relatively expensive but per second billing (link)	A+, can expose an api to run a model directly	Super nice for trying out new models and running on demand, price will add up quickly tho
vast.ai	Cheap (link)	No?	Can easily rent out your own hardware
RunPod (referral link)	Cheap (link)	Yes, for environments
Google Colab	Free Option (link)	Yes, (a jupyter notebook)	Easy to use, good for trying out notebooks that dont require a ton of resources

Text to Image

Tools

Tool	Run yourself	Notes
Stable Diffusion	Yes	Open Source, easy to make your own models
Dall-E	No	SaaS offering from Open AI, has api+webui
Midjourney	No	SaaS offering, has webui+discord bot

Prompt Help

Writing a good prompt is hard, here are some resources to help you out:

6.1 -

beverts312/mlbase

Base docker image for machine learning projects.

6.2 -

Machine Learning

For learning machine learning ;)

Docs are managed in this repo but best viewed in their hugo form here.

6.3 - Misc

6.3.1 - Github Copilot

AI Assisted Coding

Github Copilot is a new feature that allows you to write code using AI. It is currently in beta and is available for free to all Github users. It is currently available for Python, JavaScript, TypeScript, and Java.

The statement above was actually written but copilot when I opened this file and started trying to type…it is also worth noting that the last statement of that paragraph leaves out the fact that it can help in any language that is used to develop software in the open(including markdown, obviously as it wrote that statement in this file). Github Copilot is incredibly and can greatly increase your productivity. It is not perfect, but it is getting better every day. It is also worth noting that it is not a replacement for a human programmer, but it can help you write code faster and more efficiently (also written by copilot). I use it inside VS Code using the copilot extension but there are integrations available for a variety of editors and IDEs.

I would definitely recommend trying it out, there is a 60 day free trial.

6.3.2 - Text Inversion (Dreambooth) Training

Text Inversion allows us to further train the stable diffusion model on specific keys and then use those keys to generate images.

Data Perparation

The first thing you need to do in order to train a new model is to prepare the training data. You will need 2 sets of images:

Training Images - These should be images of the thing you want to train on, for example if you want to teach your model abouta a new person, this should be pictures of that person (more details below)
Regularization Images - These are should images of not the thing you want to train on but are of the same class, for example if you want to teach your model about a new person, this should include images of other people. The github user djbielejeski has a number of datasets that can be used for this purpose, in general the repo names follow the pattern of https://github.com/djbielejeski/Stable-Diffusion-Regularization-Images-${CLASS}, for example Stable-Diffusion-Regularization-Images-person_ddim contains images of people that can be used for regularization (helper script). An alternative to using a dataset like this would be to create a bunch of reuglarization images using stable-diffusion itself. In either case you will likely want to use about 200 images. You do not need to provide regularization images if training on replicate

All images should be 512x512 and in the png format.

For training images, you will need a variety of images of the thing you want to train on. They should be from different angles, different zoom levels, and with different backgrounds. I will update the table below as I experiment with more classes.

class	image distribution
person	2-3 full body, 3-5 upper body, 5-12 close up on face

Training

In order to train a new model you will need a high graphics card, I wont be specific because this is a rapidly changing space but you will likely have trouble with less than 24gb of VRAM. Since I do not have that kind of hardware I have been using RunPod (referral link), which offers pretty cheap GPU’s in the cloud. If using RunPod, I would reccomend using a configuration with:

1x GPU with 24GB of VRAM
Default Container Disk Size
40 GB Volume Disk Size
RunPod Stable Diffusion v1.5 (at time of writing v2 is available but does not seem to work as well, steps are roughly the same for v2)

I would reccomend deploying on-demand but you can roll the dice and try to save money with a spot instnace.

After the pod is provisioned I would connect to the web ui (“Connect via HTTP [Port 3000]” in the ui) in one tab and connect to the JupyterLab in another tab.

In the JupyterLab tab create 2 directories:

training_data - Place your training images in here
regularization_data - Place your regularization images in here

Then in the web ui tab:

Go to the Dreambooth tab
On create model, enter a name for your model, select a source checkpoint and click create
After the model is created, move to the train tab
Set the instance prompt to something describes the training set using a key that you want to use in future generations, so if the key was “bge” the instance prompt could be “a photo of bge”
Set the class prompt to something that describes the regularization data, so continueing with the previous example you could use “a photo of a person”
Set the dataset directory to /workspace/training_data
Set the regularization dataset directory to /workspace/regularization_data
Set the number of training steps as desired, I would reccomend starting with 1500
Click train (should take 10-30 minutes depending on the hardware and dataset size)

Trying your model out: After training is completed you can go back to the txt2img tab and try it out, if you dont see your model available you may need to click the refresh icon next to the checkpoint dropdown.

Getting your model out: You can find the created checkpoints under /workspace/stable-diffusion-webui/models/Stable-Diffusion/${MODEL_NAME}_${STEPS}.ckpt in your JupyterLab tab, from there you can download them and use them in the future. An alternative to downloading it locally could be to open a teriminal in your JupyterLab tab and upload the model to a cloud storage provider like wasabi or s3.

Using Replicate

An alternate option is to use replicate however this is going to be much more expensive ($2.50 at a minumum, probably a fair bit more with a realistic size ). They have a nice blog post here on how to set that up. Here you can find a little script I wrote to simplify that process.

An advantage to this approach other than how easy it is to run training is that your model is immediately available for use via the replicate api. I plan to put together a little sample to make it easy to use the replicate api to run models trained using an alternate method such as the one described in the section above.

Useful Resources

/d8ahazard/sd_dreambooth_extension - Webui Dreambooth extension
Automatic1111/stable-diffusion-webui - Feature rich fork of the stable diffusion web ui
JoePenna/Dreambooth-Stable-Diffusion - This is a great refernce repo where some work in this space is going on, there are a few guides there on how to train using different platforms using notebooks in that repo, I didnt have a lot of success with them specifically but there is a lot of good information there.

6.4 - Tools

I have dockerized a number of open source tools for machine learning and data science. For each tool you will find a docker-compose.yml, Dockerfile, and an info.yml. The info.yml provides a standardized view of how to leverage the tool. I have written scripts that use the information defined int he info.yml files to make the tools easy to use in a consistent manner.

Getting Started

In order for the scripts to work, you will need to install the following:

python3/pip3/virtualenv (then run pip3 install -r requirements.txt or pip3 install -r dev-requirements.txt for development)
docker+docker-compose
NVIDIA Container Toolkit

An interactive script (prepare.py) is provided to help:

initialize volume directories used by the tools
download required datasets/models/checkpoints

Docker

If you are using these for production or dont want bloat I would reccomend using your own images, these images are geared towards making things easy, not optimized

The compose file in each tool directory knows how to build the images (which are not currently on docker hub). All of the tools extend one of the base images defined in this repo:

beverts312/mlbase - Based off nvidia/cuda, installs conda, common build tools, and sets up a non-root user mluser
beverts312/machine-learning - Based off beverts312/mlbase, includes pytorch

Alot of data is downloaded in the course of building these images, if you need to share them across multiple machines in a local network I would reccomend using a local registry (example config).

6.4.1 -

Machine Learning Tools

Contains a set of dockerized open source ML tools.

6.4.2 - Background Remover

This tool removes the background from images or videos. The tool being wrapped is nadermx/backgroundremover

Docker tooling for nadermx/backgroundremover. Can process image or video.

6.4.3 - Diffusion Clip

This makes it easy to train and apply diffusion moedls to images. The tool being wrapped is gwang-kim/DiffusionCLIP

Docker tooling for gwang-kim/DiffusionCLIP.

The entrypoint is a highly opinionated wrapper on the edit single image operation, to do other things or to override options override the entrypoint.

Args

Example: docker-compose run dc --model_path pretrained/imagenet_cubism_t601.pth --config imagenet.yml --img_path ../working/test.jpg

Flag	Value
`--model_path`	`pretrained/${Name of model you put in checkpoints dir}`
`--config`	Either `celeba.yml`, `imagenet.yml`, `afqh.yml` or `` (read in source repo)
`--img_path`	`../working/{Name of file you want to process in working dir}`

Volumes

Be sure to read in the source repo about what needs to go into pretrained (checkpoints) and data.

Local Path	Purpose
`../../volumes/checkpoints`	Pretrained models
`../../volumes/data`	Data to train on
`../../volumes/working`	Repo to put file into for processing and to get processed files out of
`../../volumes/cache`	Python cache

6.4.4 - GFPGAN

This tool enhances images in a number of ways, the tool being wrapped is TenacentARC/GFPGAN

Docker tooling for TencentARC/GFPGAN. Upscales images, fixes faces.

6.4.5 - Maxim

This tool can denoise, dehaze, deblur, derain, and enhance images. The tool being wrapped is google-research/maxim.

Docker tooling for google-research/maxim.

Put input images in ../../volumes/working/input.

Run docker-compose run ml $OPERATION where $OPERATION is one of: Denoising, Deblurring, Dehazing-Indoor, Deshazing-Outdoor, Deraining-Streak, Deraining-Drop, Enhancement.

6.4.6 - Real Esrgan

This tool enhances images in a number of ways, the tool being wrapped is xinntao/Real-ESRGAN

Docker tooling for xinntao/Real-ESRGAN. Upscales images and fixes faces (using gfpgan).

Options:

-s - Scale factor (default: 4)
--face_enhance - Enhance face using GFPGAN (default: False)
--fp32 - Use fp32 precision during inference. Default: fp16 (half precision).

Run on Replicate

7 - Media

7.1 - Working with Videos

Analyze Video

mediainfo - Use -f flag to get maximum info

FFMPEG has a probe tool that is great for extracting technical metadata from a variety of video files - ffprobe '$URL'

FFMPEG Video Tricks

Overlay information on a video (you provide start timecode, framrate):
ffmpeg -i $INPUT_VIDEO -vf "drawtext=fontsize=15:fontfile=/Library/Fonts/Arial\ Bold\.ttf:timecode='01\:00\:00\:00':rate=24:text='TCR\:':fontsize=52:fontcolor='white':boxcolor=0x000000AA:box=1:x=1:y=1, drawtext=fontsize=15:fontfile=/Library/Fonts/Arial\ Bold\.ttf:text='Frames\:%{n}':fontsize=52:fontcolor='white':boxcolor=0x000000AA:box=1:x=1:y=60, drawtext=fontsize=15:fontfile=/Library/Fonts/Arial\ Bold\.ttf:text='Seconds\:%{pts}':fontsize=52:fontcolor='white':boxcolor=0x000000AA:box=1:x=1:y=120, drawtext=fontsize=15:fontfile=/Library/Fonts/Arial\ Bold\.ttf:text='Framerate\:23.976':fontsize=52:fontcolor='white':boxcolor=0x000000AA:box=1:x=1:y=180" $OUTPUT_VIDEO

Create a blank video:
ffmpeg -t 600 -s 640x480 -f rawvideo -pix_fmt rgb24 -r 23.976 -i /dev/zero 10min_23976.mp4

8 - Node

8.1 - Security

Notes from Node.js Security: Pentesting and Exploitation course on StackSkills.

Vulnerabilities

Full code examples on github

Global Namespace Pollution

Be very careful with global variables.

HTTP Parameter Polution (express)

If you pass the same url parameter multiple times in request they will all be read as comma seperated values.
For example consider this code:

app.get('/hpp', (req, res) => {
    res.send(req.query.id);
});

And we sent this request url/hpp?id=123&id=456 the response would be 123,456, be aware of the implications of this.

eval() is Evil

Be careful with eval.
For example consider this code:

app.get('/eval', (req, res) => {
    let resp=eval("("+req,query.name+")");
    res.send(resp);
});

We could pass something like process.exit(1) using the name parameter and this would kill the web server. We could leverage an exploit like this to allow a remote connection to the web server.

Remote OS Command Execution

Be careful with child_process.
For example consider this code:

var exe = require('child_process');  
app.get('/os', (req, res) => {
    exe.exec('ping -c 2 ' + req.query.ping, (err, data) => {
        res.send(data);
    });
});

We could pass something like 127.0.0.1; whoami, and get the whoami response back from the host. Obviously we could do much more malicious things than determine the user.

Untrusted User Input

For example consider this code:

app.get('/hello', (req, res) => {
    res.send('hello ' + req.query.name);
});

We could pass someting like <img src=x onerror=alert('haha')> and that html would be rendered allowing us to do things like XSS.

Regex DoS

Ensure you are using safe regex, if you use unsafe regex, it will be easy for attackers to induce load on your server.

Information Disclosure

Ideally we want to hide things like our techstack/frameworks from the users (and attackers), this info can come up in headers, error pages, and cookies.
Helmet can be used to help make this easier.

To disable the x-powered-by header you can either do app.disable('x-powered-b') or if you are using helmet, app.use(helmet.hidePoweredBy())

Secure Code Tips

Use strict mode.
Use helmet (if you are using express) or obfuscate the same info yourself.

Things to check in a code review

File & DB Operations
Insecure Crypto
Insecure SSL
Insecure Server to Server SSL
Logical Flaws
Untrusted user input

8.2 - Snippets

Encoding/decoding

To base64 encode a string: new Buffer(str).toString('base64')
To decode a base64 string: new Buffer(str, 'base64').toString('ascii')

8.3 - Using Docker with Node

Dockerfile

This is an example of a Dockerfile for a node application.

FROM alpine:3.4  

RUN apk add --no-cache --update nodejs &&\
    mkdir /app

ENTRYPOINT /app

ADD . /app

EXPOSE 3000 4300

CMD npm start

Here is a line by line breakdown:

We start with alpine, this will help us keep only the things we absolutely need in the image
In the first line of the run statement we install nodejs (this includes npm)
In the second line we make a directory to add our app to
We set our working directory to the newly created app directory
Expose the ports your app listens only
Use npm start to start the app on container startup

Building

Coming soon

Running

Coming soon

Managing App Configuration

Coming soon

9 - Security

9.1 - nmap

nmap is a powerful tool for mapping networks (website).

Examples

nmap -p 22 -sV 10.20.21.0/24 - Scan IP’s 10.20.21.0 - 10.20.21.255 on port 22

9.2 - OpenSSL

OpenSSL is a powerful CLI for working with certificates.

Description	Command
Read cert	`openssl x509 -in cert.pem -text`
Create domain key	`openssl genrsa -out <your-domain>.key <2048 or 4096>`
Create a CSR	`openssl req -new -sha256 -key <your-domain>.key -out <your-domain>.csr`
Create a Self Signed Cert	`echo 01

9.3 - SSL Stripping

After we connect to a network we can see all traffic on it, which is not very useful because most important traffic is encrypted.
Install sslstrip and dsniff

echo 1 > /proc/sys/net/ipv4/ip_forward Add rule - iptables -t nat -A PREROUTING -p tcp --destination-port 80 -j REDIRECT --to-port 8080
Verify Rule - iptables -t nat -L PREROUTING

Ensure redirect port is open - iptables -I INPUT 1 -P tcp --dport 8080

9.4 - Staying Anonymous

Proxy, VPN, tor

Note about location: You want to access the servers you are targetting from the same region as the typical user base to blend in

Commands below may work on other distributions but they assume you are on Kali Linux.

Tor

Shouldnt run as root (adduser newusername)
Download from https://wwww.torproject.org

Traffic goes through a variety of nodes, at each node another layer of encryption is added.
After making it through the inner nodes, the exit node makes the actual request.
Difficult/Impossible to track unless somebody controlled all inner nodes (exteremly unlikely).

Proxychains

Proxychains allows you to route traffic through a series of proxies. Use dynamic_chain in most cases Can be HTTP, SOCKS4, or SOCKS5
Always use SOCKS5

Add SOCKS5 127.0.0.1 9050 to the bottom of your /etc/proxychains.conf
Start tor - service tor start
Verify - service tor status
Verify anonyminity - proxychains firefox www.dnsleaktest.com (your IP should be in another country)

The more free proxies you use the slower things will be, select just a few free proxies with the highest uptime/reviews

VPN

Change DNS Provider from your ISP

OpenDNS is a good option.
Replace your prepend domain-name-servers ....; line with:
prepend domain-name-servers IP1, IP2;, where IP1/2 are the OpenDNS IP’s.
Restart your network-manager: service network-manager restart
Verify changes: cat /etc/resolve.conf, the output should show nameserver IP1, nameserver IP2 as your first 2 lines.

Get & Use VPN

Download free vpn from a site like VPN Book, note user/password.
Unzip download.
Make sure all browsers are closed. Navigate to unziped folder, run openvpn vpnprefix-tcp443.ovpn, vpnprefix will vary based of the package you chose.
Use credentials to login to vpn.
Wait for Initialization Sequence Complete message to come up.
Verfiy: Open a browser and go do DNS Leak Test, verify your location is not your actual location. Click Standart Test and make sure your ISP is not your actual ISP.

Mac Addresses

Mac address doesnt make it past the router.
Doesn’t really matter if you change it on a VM.

macchanger is a great esay to use tool.

Change mac address everytime you bootup:

Open crontab config: crontab -e
Add this line and save: @reboot macchanger -r eth0

9.5 - WiFi

These notes are intended for Whitehat/Educational purposes only

Configure Machine

Use ifconfig to determine device name (assume wlan0 for cmds)
ifconfig wlan0 down - take device down
iwconfig wlan0 mode monitor - set monitor mode
ifconfig wlan0 up - bring device back up

airmon-ng check wlan0 - Make sure nothing is interfering with your device
Kill processes (kill network manager first)

Useful Commands

Remember to change your mac address (notes on this page).
airodump-ng wlan0 - Show wireless access points and connected devices
airodump-ng -c CHANNEL --bssid MAC -w FILENAME DEVICE - lookat traffic on a specific device
aireplay-ng -0 0 -a MAC DEVICE - DOS attack a wireless network

To crack WPA2

Force user(s) to disconnect using a DOS attack, then watch them recconect and capture the handshake.
aircrack-ng -w WORDFILE CAPFILE -e ESSID - Try to crack capture file with word list
crunch MINLEN MAXLEN -t PATTERN | aircrack-ng -w -CAPFILE -e ESSID - Try to crack with pipe from crunch (this operation is happening locally against the file not live against the router)

To crack WPS

Use wash -i DEVICE to view available routers to attack without wps locked.
Use airodump-ng to confirm you have good enough range.
Use reaver to orchestrate your attack.
WPS uses either a 4 or 8 digit pin of consisting of only numbers, you should disable it on your router (usually enabled by default).
Many routers will disable pin access for some period of time after too many failed attempts (ratelimiting), you can adjust reaver to try less often to avoid tripping this failsafe.
That same falesafe makes it easy to perform DOS on networks that rely on WPS authentication. If the ratelimiting locks you out completely, you can DOS it hard enough to try and force a the administrator to reset their router.

DOS Attacks

Basically impossible to stop.
May need to set the channel of your wifi card iwconfig DEVICE channel CHANNEL
Then use aireplay-ng -0 0 -a MAC DEVICE, to DOS all connected machines.

10 - Shell

10.1 - Bash

Here is an example of error handling:

some-executable  
if [[ $? -ne 0 ]] ; then
        echo "If we hit this block there was an error"
fi

This function can be used to determine the Operating System (atleast ones I care about):

setOs() {
    MY_OS=""

    case "$OSTYPE" in
        darwin*)  MY_OS="darwin" ;; 
        linux*)   MY_OS="linux" ;;
    esac

    if [[ "$MY_OS" -eq "linux" ]] && [[ -r /etc/debian_version ]] ; then
        MY_OS="debian"
    elif [[ "$MY_OS" -eq "linux" ]] && [[ -r /etc/fedora-release ]] ; then
        MY_OS="fedora"
    elif [[ "$MY_OS" -eq "linux" ]] && [[ -r /etc/oracle-release ]] ; then
        MY_OS="oel"
    elif [[ "$MY_OS" -eq "linux" ]] && [[ -r /etc/centos-release  ]] ; then
        MY_OS="centos"
    elif [[ "$MY_OS" -eq "linux" ]] && [[ -r /etc/redhat-release  ]] ; then
        MY_OS="rhel"
    elif [[ "$MY_OS" -ne "darwin" ]] ; then
        echo "Could not determine OS"
        exit 1
    fi
}

10.2 - Powershell

The variables $IsLinux , $IsWindows, & $IsMac can be used to determine OS