Machine Learning
These pages come from beverts312/machine-learning
GPU’s in the cloud
Service | Pricing | Template Support | Notes |
---|
Replicate | Relatively expensive but per second billing (link) | A+, can expose an api to run a model directly | Super nice for trying out new models and running on demand, price will add up quickly tho |
vast.ai | Cheap (link) | No? | Can easily rent out your own hardware |
RunPod (referral link) | Cheap (link) | Yes, for environments | |
Google Colab | Free Option (link) | Yes, (a jupyter notebook) | Easy to use, good for trying out notebooks that dont require a ton of resources |
Text to Image
Tool | Run yourself | Notes |
---|
Stable Diffusion | Yes | Open Source, easy to make your own models |
Dall-E | No | SaaS offering from Open AI, has api+webui |
Midjourney | No | SaaS offering, has webui+discord bot |
Prompt Help
Writing a good prompt is hard, here are some resources to help you out:
1 -
beverts312/mlbase
Base docker image for machine learning projects.
2 -
Machine Learning
For learning machine learning ;)
Docs are managed in this repo but best viewed in their hugo form here.
3.1 - Github Copilot
AI Assisted Coding
Github Copilot is a new feature that allows you to write code using AI. It is currently in beta and is available for free to all Github users. It is currently available for Python, JavaScript, TypeScript, and Java.
The statement above was actually written but copilot when I opened this file and started trying to type…it is also worth noting that the last statement of that paragraph leaves out the fact that it can help in any language that is used to develop software in the open(including markdown, obviously as it wrote that statement in this file). Github Copilot is incredibly and can greatly increase your productivity. It is not perfect, but it is getting better every day. It is also worth noting that it is not a replacement for a human programmer, but it can help you write code faster and more efficiently (also written by copilot). I use it inside VS Code using the copilot extension but there are integrations available for a variety of editors and IDEs.
I would definitely recommend trying it out, there is a 60 day free trial.
3.2 - Text Inversion (Dreambooth) Training
Text Inversion allows us to further train the stable diffusion model on specific keys and then use those keys to generate images.
Data Perparation
The first thing you need to do in order to train a new model is to prepare the training data. You will need 2 sets of images:
- Training Images - These should be images of the thing you want to train on, for example if you want to teach your model abouta a new person, this should be pictures of that person (more details below)
- Regularization Images - These are should images of not the thing you want to train on but are of the same class, for example if you want to teach your model about a new person, this should include images of other people. The github user djbielejeski has a number of datasets that can be used for this purpose, in general the repo names follow the pattern of
https://github.com/djbielejeski/Stable-Diffusion-Regularization-Images-${CLASS}
, for example Stable-Diffusion-Regularization-Images-person_ddim contains images of people that can be used for regularization (helper script). An alternative to using a dataset like this would be to create a bunch of reuglarization images using stable-diffusion itself. In either case you will likely want to use about 200 images. You do not need to provide regularization images if training on replicate
All images should be 512x512 and in the png format.
For training images, you will need a variety of images of the thing you want to train on. They should be from different angles, different zoom levels, and with different backgrounds. I will update the table below as I experiment with more classes.
class | image distribution |
---|
person | 2-3 full body, 3-5 upper body, 5-12 close up on face |
Training
In order to train a new model you will need a high graphics card, I wont be specific because this is a rapidly changing space but you will likely have trouble with less than 24gb of VRAM.
Since I do not have that kind of hardware I have been using RunPod (referral link), which offers pretty cheap GPU’s in the cloud. If using RunPod, I would reccomend using a configuration with:
- 1x GPU with 24GB of VRAM
- Default Container Disk Size
- 40 GB Volume Disk Size
- RunPod Stable Diffusion v1.5 (at time of writing v2 is available but does not seem to work as well, steps are roughly the same for v2)
I would reccomend deploying on-demand but you can roll the dice and try to save money with a spot instnace.
After the pod is provisioned I would connect to the web ui (“Connect via HTTP [Port 3000]” in the ui) in one tab and connect to the JupyterLab in another tab.
In the JupyterLab tab create 2 directories:
training_data
- Place your training images in hereregularization_data
- Place your regularization images in here
Then in the web ui tab:
- Go to the Dreambooth tab
- On create model, enter a name for your model, select a source checkpoint and click create
- After the model is created, move to the train tab
- Set the instance prompt to something describes the training set using a key that you want to use in future generations, so if the key was “bge” the instance prompt could be “a photo of bge”
- Set the class prompt to something that describes the regularization data, so continueing with the previous example you could use “a photo of a person”
- Set the dataset directory to
/workspace/training_data
- Set the regularization dataset directory to
/workspace/regularization_data
- Set the number of training steps as desired, I would reccomend starting with 1500
- Click train (should take 10-30 minutes depending on the hardware and dataset size)
Trying your model out: After training is completed you can go back to the txt2img
tab and try it out, if you dont see your model available you may need to click the refresh icon next to the checkpoint dropdown.
Getting your model out: You can find the created checkpoints under /workspace/stable-diffusion-webui/models/Stable-Diffusion/${MODEL_NAME}_${STEPS}.ckpt
in your JupyterLab tab, from there you can download them and use them in the future.
An alternative to downloading it locally could be to open a teriminal in your JupyterLab tab and upload the model to a cloud storage provider like wasabi or s3.
Using Replicate
An alternate option is to use replicate however this is going to be much more expensive ($2.50 at a minumum, probably a fair bit more with a realistic size ). They have a nice blog post here on how to set that up. Here you can find a little script I wrote to simplify that process.
An advantage to this approach other than how easy it is to run training is that your model is immediately available for use via the replicate api. I plan to put together a little sample to make it easy to use the replicate api to run models trained using an alternate method such as the one described in the section above.
Useful Resources
4 - Tools
I have dockerized a number of open source tools for machine learning and data science.
For each tool you will find a docker-compose.yml
, Dockerfile
, and an info.yml
.
The info.yml
provides a standardized view of how to leverage the tool.
I have written scripts that use the information defined int he info.yml
files to make the tools easy to use in a consistent manner.
Getting Started
In order for the scripts to work, you will need to install the following:
- python3/pip3/virtualenv (then run
pip3 install -r requirements.txt
or pip3 install -r dev-requirements.txt
for development) - docker+docker-compose
- NVIDIA Container Toolkit
An interactive script (prepare.py) is provided to help:
- initialize volume directories used by the tools
- download required datasets/models/checkpoints
Docker
If you are using these for production or dont want bloat I would reccomend using your own images, these images are geared towards making things easy, not optimized
The compose file in each tool directory knows how to build the images (which are not currently on docker hub).
All of the tools extend one of the base images defined in this repo:
Alot of data is downloaded in the course of building these images, if you need to share them across multiple machines in a local network I would reccomend using a local registry (example config).
4.1 -
Contains a set of dockerized open source ML tools.
4.2 - Background Remover
This tool removes the background from images or videos. The tool being wrapped is nadermx/backgroundremover
Docker tooling for nadermx/backgroundremover. Can process image or video.
4.3 - Diffusion Clip
This makes it easy to train and apply diffusion moedls to images. The tool being wrapped is gwang-kim/DiffusionCLIP
Docker tooling for gwang-kim/DiffusionCLIP.
The entrypoint is a highly opinionated wrapper on the edit single image
operation, to do other things or to override options override the entrypoint.
Args
Example: docker-compose run dc --model_path pretrained/imagenet_cubism_t601.pth --config imagenet.yml --img_path ../working/test.jpg
Flag | Value |
---|
--model_path | pretrained/${Name of model you put in checkpoints dir} |
--config | Either celeba.yml , imagenet.yml , afqh.yml or `` (read in source repo) |
--img_path | ../working/{Name of file you want to process in working dir} |
Volumes
Be sure to read in the source repo about what needs to go into pretrained
(checkpoints) and data
.
Local Path | Purpose |
---|
../../volumes/checkpoints | Pretrained models |
../../volumes/data | Data to train on |
../../volumes/working | Repo to put file into for processing and to get processed files out of |
../../volumes/cache | Python cache |
4.4 - GFPGAN
This tool enhances images in a number of ways, the tool being wrapped is TenacentARC/GFPGAN
Docker tooling for TencentARC/GFPGAN. Upscales images, fixes faces.
4.5 - Maxim
This tool can denoise, dehaze, deblur, derain, and enhance images.
The tool being wrapped is google-research/maxim.
Docker tooling for google-research/maxim.
Put input images in ../../volumes/working/input
.
Run docker-compose run ml $OPERATION
where $OPERATION
is one of: Denoising
, Deblurring
, Dehazing-Indoor
, Deshazing-Outdoor
, Deraining-Streak
, Deraining-Drop
, Enhancement
.
4.6 - Real Esrgan
This tool enhances images in a number of ways, the tool being wrapped is xinntao/Real-ESRGAN
Docker tooling for xinntao/Real-ESRGAN. Upscales images and fixes faces (using gfpgan).
Options:
-s
- Scale factor (default: 4)--face_enhance
- Enhance face using GFPGAN (default: False)--fp32
- Use fp32 precision during inference. Default: fp16 (half precision).
Run on Replicate