This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Machine Learning

1:
2:
3: Misc

3.1: Github Copilot
3.2: Text Inversion (Dreambooth) Training

4: Tools

4.1:
4.2: Background Remover

4.3: Diffusion Clip

4.4: GFPGAN

4.5: Maxim

4.6: Real Esrgan

These pages come from beverts312/machine-learning

GPU’s in the cloud

Service	Pricing	Template Support	Notes
Replicate	Relatively expensive but per second billing (link)	A+, can expose an api to run a model directly	Super nice for trying out new models and running on demand, price will add up quickly tho
vast.ai	Cheap (link)	No?	Can easily rent out your own hardware
RunPod (referral link)	Cheap (link)	Yes, for environments
Google Colab	Free Option (link)	Yes, (a jupyter notebook)	Easy to use, good for trying out notebooks that dont require a ton of resources

Text to Image

Tools

Tool	Run yourself	Notes
Stable Diffusion	Yes	Open Source, easy to make your own models
Dall-E	No	SaaS offering from Open AI, has api+webui
Midjourney	No	SaaS offering, has webui+discord bot

Prompt Help

Writing a good prompt is hard, here are some resources to help you out:

1 -

beverts312/mlbase

Base docker image for machine learning projects.

2 -

Machine Learning

For learning machine learning ;)

Docs are managed in this repo but best viewed in their hugo form here.

3 - Misc

3.1 - Github Copilot

AI Assisted Coding

Github Copilot is a new feature that allows you to write code using AI. It is currently in beta and is available for free to all Github users. It is currently available for Python, JavaScript, TypeScript, and Java.

The statement above was actually written but copilot when I opened this file and started trying to type…it is also worth noting that the last statement of that paragraph leaves out the fact that it can help in any language that is used to develop software in the open(including markdown, obviously as it wrote that statement in this file). Github Copilot is incredibly and can greatly increase your productivity. It is not perfect, but it is getting better every day. It is also worth noting that it is not a replacement for a human programmer, but it can help you write code faster and more efficiently (also written by copilot). I use it inside VS Code using the copilot extension but there are integrations available for a variety of editors and IDEs.

I would definitely recommend trying it out, there is a 60 day free trial.

3.2 - Text Inversion (Dreambooth) Training

Text Inversion allows us to further train the stable diffusion model on specific keys and then use those keys to generate images.

Data Perparation

The first thing you need to do in order to train a new model is to prepare the training data. You will need 2 sets of images:

Training Images - These should be images of the thing you want to train on, for example if you want to teach your model abouta a new person, this should be pictures of that person (more details below)
Regularization Images - These are should images of not the thing you want to train on but are of the same class, for example if you want to teach your model about a new person, this should include images of other people. The github user djbielejeski has a number of datasets that can be used for this purpose, in general the repo names follow the pattern of https://github.com/djbielejeski/Stable-Diffusion-Regularization-Images-${CLASS}, for example Stable-Diffusion-Regularization-Images-person_ddim contains images of people that can be used for regularization (helper script). An alternative to using a dataset like this would be to create a bunch of reuglarization images using stable-diffusion itself. In either case you will likely want to use about 200 images. You do not need to provide regularization images if training on replicate

All images should be 512x512 and in the png format.

For training images, you will need a variety of images of the thing you want to train on. They should be from different angles, different zoom levels, and with different backgrounds. I will update the table below as I experiment with more classes.

class	image distribution
person	2-3 full body, 3-5 upper body, 5-12 close up on face

Training

In order to train a new model you will need a high graphics card, I wont be specific because this is a rapidly changing space but you will likely have trouble with less than 24gb of VRAM. Since I do not have that kind of hardware I have been using RunPod (referral link), which offers pretty cheap GPU’s in the cloud. If using RunPod, I would reccomend using a configuration with:

1x GPU with 24GB of VRAM
Default Container Disk Size
40 GB Volume Disk Size
RunPod Stable Diffusion v1.5 (at time of writing v2 is available but does not seem to work as well, steps are roughly the same for v2)

I would reccomend deploying on-demand but you can roll the dice and try to save money with a spot instnace.

After the pod is provisioned I would connect to the web ui (“Connect via HTTP [Port 3000]” in the ui) in one tab and connect to the JupyterLab in another tab.

In the JupyterLab tab create 2 directories:

training_data - Place your training images in here
regularization_data - Place your regularization images in here

Then in the web ui tab:

Go to the Dreambooth tab
On create model, enter a name for your model, select a source checkpoint and click create
After the model is created, move to the train tab
Set the instance prompt to something describes the training set using a key that you want to use in future generations, so if the key was “bge” the instance prompt could be “a photo of bge”
Set the class prompt to something that describes the regularization data, so continueing with the previous example you could use “a photo of a person”
Set the dataset directory to /workspace/training_data
Set the regularization dataset directory to /workspace/regularization_data
Set the number of training steps as desired, I would reccomend starting with 1500
Click train (should take 10-30 minutes depending on the hardware and dataset size)

Trying your model out: After training is completed you can go back to the txt2img tab and try it out, if you dont see your model available you may need to click the refresh icon next to the checkpoint dropdown.

Getting your model out: You can find the created checkpoints under /workspace/stable-diffusion-webui/models/Stable-Diffusion/${MODEL_NAME}_${STEPS}.ckpt in your JupyterLab tab, from there you can download them and use them in the future. An alternative to downloading it locally could be to open a teriminal in your JupyterLab tab and upload the model to a cloud storage provider like wasabi or s3.

Using Replicate

An alternate option is to use replicate however this is going to be much more expensive ($2.50 at a minumum, probably a fair bit more with a realistic size ). They have a nice blog post here on how to set that up. Here you can find a little script I wrote to simplify that process.

An advantage to this approach other than how easy it is to run training is that your model is immediately available for use via the replicate api. I plan to put together a little sample to make it easy to use the replicate api to run models trained using an alternate method such as the one described in the section above.

Useful Resources

/d8ahazard/sd_dreambooth_extension - Webui Dreambooth extension
Automatic1111/stable-diffusion-webui - Feature rich fork of the stable diffusion web ui
JoePenna/Dreambooth-Stable-Diffusion - This is a great refernce repo where some work in this space is going on, there are a few guides there on how to train using different platforms using notebooks in that repo, I didnt have a lot of success with them specifically but there is a lot of good information there.

4 - Tools

I have dockerized a number of open source tools for machine learning and data science. For each tool you will find a docker-compose.yml, Dockerfile, and an info.yml. The info.yml provides a standardized view of how to leverage the tool. I have written scripts that use the information defined int he info.yml files to make the tools easy to use in a consistent manner.

Getting Started

In order for the scripts to work, you will need to install the following:

python3/pip3/virtualenv (then run pip3 install -r requirements.txt or pip3 install -r dev-requirements.txt for development)
docker+docker-compose
NVIDIA Container Toolkit

An interactive script (prepare.py) is provided to help:

initialize volume directories used by the tools
download required datasets/models/checkpoints

Docker

If you are using these for production or dont want bloat I would reccomend using your own images, these images are geared towards making things easy, not optimized

The compose file in each tool directory knows how to build the images (which are not currently on docker hub). All of the tools extend one of the base images defined in this repo:

beverts312/mlbase - Based off nvidia/cuda, installs conda, common build tools, and sets up a non-root user mluser
beverts312/machine-learning - Based off beverts312/mlbase, includes pytorch

Alot of data is downloaded in the course of building these images, if you need to share them across multiple machines in a local network I would reccomend using a local registry (example config).

4.1 -

Machine Learning Tools

Contains a set of dockerized open source ML tools.

4.2 - Background Remover

This tool removes the background from images or videos. The tool being wrapped is nadermx/backgroundremover

Docker tooling for nadermx/backgroundremover. Can process image or video.

4.3 - Diffusion Clip

This makes it easy to train and apply diffusion moedls to images. The tool being wrapped is gwang-kim/DiffusionCLIP

Docker tooling for gwang-kim/DiffusionCLIP.

The entrypoint is a highly opinionated wrapper on the edit single image operation, to do other things or to override options override the entrypoint.

Args

Example: docker-compose run dc --model_path pretrained/imagenet_cubism_t601.pth --config imagenet.yml --img_path ../working/test.jpg

Flag	Value
`--model_path`	`pretrained/${Name of model you put in checkpoints dir}`
`--config`	Either `celeba.yml`, `imagenet.yml`, `afqh.yml` or `` (read in source repo)
`--img_path`	`../working/{Name of file you want to process in working dir}`

Volumes

Be sure to read in the source repo about what needs to go into pretrained (checkpoints) and data.

Local Path	Purpose
`../../volumes/checkpoints`	Pretrained models
`../../volumes/data`	Data to train on
`../../volumes/working`	Repo to put file into for processing and to get processed files out of
`../../volumes/cache`	Python cache

4.4 - GFPGAN

This tool enhances images in a number of ways, the tool being wrapped is TenacentARC/GFPGAN

Docker tooling for TencentARC/GFPGAN. Upscales images, fixes faces.

4.5 - Maxim

This tool can denoise, dehaze, deblur, derain, and enhance images. The tool being wrapped is google-research/maxim.

Docker tooling for google-research/maxim.

Put input images in ../../volumes/working/input.

Run docker-compose run ml $OPERATION where $OPERATION is one of: Denoising, Deblurring, Dehazing-Indoor, Deshazing-Outdoor, Deraining-Streak, Deraining-Drop, Enhancement.

4.6 - Real Esrgan

This tool enhances images in a number of ways, the tool being wrapped is xinntao/Real-ESRGAN

Docker tooling for xinntao/Real-ESRGAN. Upscales images and fixes faces (using gfpgan).

Options:

-s - Scale factor (default: 4)
--face_enhance - Enhance face using GFPGAN (default: False)
--fp32 - Use fp32 precision during inference. Default: fp16 (half precision).

Run on Replicate