MLOps study - Raviraja Week 0: Pytorch Lightning

11 Oct 2022

I’m going to study MLOps by referring to Raviraja’s blog posts. Starting is half the battle!! :D

Week 0 sets up the environment for studying MLOps. Since Raviraja researches NLP, let’s keep in mind that the environment setup is biased toward NLP and a bit of classification, and move on. (Later I’ll develop this into MLOps that uses RL.) Basically, it uses the Pytorch-Lightning library. Pytorch lightning is a kind of pytorch wrapper :D

Pytorch Lightning largely consists of 4 modules. Let’s look at them in turn.

DataModule
LightningModule
Trainer
Inference

DataModule

Pytorch lightning uses a DataModule similar to Pytorch’s DataLoader. There’s a process of preprocessing the data before using the DataLoader, and you can think of it as having all of that included inside the module.

< methods you need to define >

prepare_data -> download data
setup -> preprocess data
train_dataloader, val_dataloader, test_dataloader -> data loaders

< tasks performed inside the DataModule >

Download / tokenize / process
Clean and save to disk
Load inside Dataset
Apply transforms (rotate, tokenize, etc…)
Wrap inside a DataLoader (Pytorch)

class DataModule(pl.LightningDataModule):
    def __init__(self, model_name="google/bert_uncased_L-2_H-128_A-2", batch_size=32):
        super().__init__()

        self.batch_size = batch_size
        self.tokenizer = AutoTokenizer.from_pretrained(model_name) # Transformer (BERT) model

    def prepare_data(self):
        cola_dataset = load_dataset("glue", "cola")
        self.train_data = cola_dataset["train"]
        self.val_data = cola_dataset["validation"]

    def tokenize_data(self, example):
        # processing the data
        return self.tokenizer(
            example["sentence"],
            truncation=True,
            padding="max_length",
            max_length=256,
        )

    def setup(self, stage=None):
        if stage == "fit" or stage is None:
            self.train_data = self.train_data.map(self.tokenize_data, batched=True)
            self.train_data.set_format(
                type="torch", columns=["input_ids", "attention_mask", "label"]
            )

            self.val_data = self.val_data.map(self.tokenize_data, batched=True)
            self.val_data.set_format(
                type="torch", columns=["input_ids", "attention_mask", "label"]
            )

    def train_dataloader(self):
        return torch.utils.data.DataLoader(
            self.train_data, batch_size=self.batch_size, shuffle=True
        )

    def val_dataloader(self):
        return torch.utils.data.DataLoader(
            self.val_data, batch_size=self.batch_size, shuffle=False
        )

LightningModule

Just as we inherited torch.nn.Module when building a model in Pytorch, Pytorch-lightning inherits pl.LightningModule. Unlike before when you only had to define forward, here you need to define a few additional methods. (Document)

< methods you need to define >

forward -> model forward
training_step -> Update and Loss computation
validation_step
test_step (optional)
configure_optimizers -> Optimizer initialization

class ColaModel(pl.LightningModule):
    def __init__(self, model_name="google/bert_uncased_L-2_H-128_A-2", lr=1e-2):
        super(ColaModel, self).__init__()
        self.save_hyperparameters()

        self.bert = AutoModel.from_pretrained(model_name)
        self.W = nn.Linear(self.bert.config.hidden_size, 2)
        self.num_classes = 2

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)

        h_cls = outputs.last_hidden_state[:, 0]
        logits = self.W(h_cls)
        return logits

    def training_step(self, batch, batch_idx):
        logits = self.forward(batch["input_ids"], batch["attention_mask"])
        loss = F.cross_entropy(logits, batch["label"])
        self.log("train_loss", loss, prog_bar=True)
        return loss

    def validation_step(self, batch, batch_idx):
        logits = self.forward(batch["input_ids"], batch["attention_mask"])
        loss = F.cross_entropy(logits, batch["label"])
        _, preds = torch.max(logits, dim=1)
        val_acc = accuracy_score(preds.cpu(), batch["label"].cpu())
        val_acc = torch.tensor(val_acc)
        self.log("val_loss", loss, prog_bar=True)
        self.log("val_acc", val_acc, prog_bar=True)

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.hparams["lr"])

Trainer

The DataModule and the Pytorch-lightning model are trained using the Trainer. You could see it as an approach similar to Tensorflow’s Session.
< examples of options the Trainer can use >

logging
gradient accumulation
half precision training
distributed computing

< Loggers >

TensorboardLogger
WandbLogger

< Callbacks > Documents

cola_data = DataModule()
cola_model = ColaModel()

checkpoint_callbacks = [
    ModelCheckpoint(dirpath="./models", monitor="val_loss", mode="min"), # Save model
    EarlyStopping(monitor="val_loss", patience=3, verbose=True, mode="min"),
]

trainer = pl.Trainer(
    gpus=(1 if torch.cuda.is_available() else 0),
    max_epochs=1,
    fast_dev_run=False, # True: one batch training one validation -> for debugging
    logger=pl.loggers.TensorBoardLogger("logs/", name="cola", version=1), # directory: logs/cola
    # logger = pl.loggers.WandbLogger(name='cola',project='pytorchlightning')
    callbacks=checkpoint_callbacks,
)
trainer.fit(cola_model, cola_data)

Inference

MLOps separates the model’s Training and Inference modules. This is because, even while training is in progress on the server, you need to be able to freeze the model, manage versions, and debug it.

< methods you need to define >

predict

< tasks performed inside Inference >

Load the trained model
Get the input
Convert the input in the required format
Get the predictions

class ColaPredictor:
    def __init__(self, model_path):
        self.model_path = model_path
        # loading the trained model
        self.model = ColaModel.load_from_checkpoint(model_path)
        # keep the model in eval mode
        self.model.eval()
        self.model.freeze()
        self.processor = DataModule()
        self.softmax = torch.nn.Softmax(dim=0)
        self.lables = ["unacceptable", "acceptable"]

    def predict(self, text):
        # text => run time input
        inference_sample = {"sentence": text}
        # tokenizing the input
        processed = self.processor.tokenize_data(inference_sample)
        # predictions
        logits = self.model(
            torch.tensor([processed["input_ids"]]),
            torch.tensor([processed["attention_mask"]]),
        )
        scores = self.softmax(logits[0]).tolist()
        predictions = []
        for score, label in zip(scores, self.lables):
            predictions.append({"label": label, "score": score})
        return predictions

Honestly, it doesn’t seem like that big of a change, but they say that if Pytorch is the ice cream, then Pytorch Lightning is the cherry on top. I’m still not sure what features count as MLOps, but considering its compatibility with Pytorch, it seems like I’ll be able to use these features much more simply :ㅇ

Download the ipynb file

Jae-Kyung Cho Being unique is better than being perfect

MLOps study - Raviraja Week 0: Pytorch Lightning

DataModule

LightningModule

Trainer

Inference

references:

Jae-Kyung Cho Being unique is better than being perfect

MLOps study - Raviraja Week 0: Pytorch Lightning

DataModule

LightningModule

Trainer

Inference

references:

Related posts

Diary - AI training이란 무엇일까 (feat. Claude Code) 06 Mar 2026

Diary - What Is AI Training, Really? (feat. Claude Code) 06 Mar 2026

Diary - LLM에서 효율적인 강화학습이란 무엇일까 2 (feat. Qwen-3.5와 GLM-5) 26 Feb 2026