[Kubeflow] ENAS RNN pipeline

Honestree 2022. 10. 20. 19:06

2022. 10. 20. 19:06

소개

ENAS
- NAS 기법 중 parameter sharing 기법을 제안한 모델로 학습시간을 획기적으로 단축시켰던 모델이다.
Pipeline
- Docker, kubernetes, kubeflow를 사용하여 데이터 준비, 모델 학습, 서비스 제공까지 일련의 자동화 과정을 구축하는 것
깃허브 링크

GitHub - Hyunmok-Park/enas_pipeline

Contribute to Hyunmok-Park/enas_pipeline development by creating an account on GitHub.

github.com

모델

ENAS 오픈 소스 코드를 그대로 활용하였다.
- https://github.com/carpedm20/ENAS-pytorch

파이프라인

콤포넌트
- train
  - ENAS의 controller, shared를 학습하는 과정으로 출력으로 최적의 모델을 찾아낸다.
- re-train
  - train 단계에서 찾아낸 최적의 모델을 다시 scratch부터 학습하는 단계
- serve
  - bentoML 라이브러리를 사용해서 재학습한 네트워크를 API 형태로 제공
- 위의 과정은 오픈 소스를 그대로 활용
파이프라인
- 파이썬 kfp 라이브러리를 사용해서 pipeline을 형성한다.
- 출력 tar 파일을 kubeflow에 업로드하면 pipeline을 사용할 수 있다.
- 도커 이미지 파일은 개인 pc에 private registry를 형성해서 사용했다.
- 도커 private registry 형성 방법은 따로 정리해두었다.
- 도커 private registry 만들기

import kfp.dsl as dsl
from kubernetes import client as k8s_client

def TrainOp(vop):
    return dsl.ContainerOp(
        name="training pipeline",
        image="xx.xxx.xx.xx:5000/phm:0.1-enas-train",

        command = [
            "sh", "run_train_container.sh"
        ],

        pvolumes={"src/data": vop},
    ).add_pod_label("app", "enas-application")

def ReTrainOp(trainop):
    return dsl.ContainerOp(
        name="retraining pipeline",
        image="xx.xxx.xx.xx:5000/phm:0.1-enas-retrain",

        command = [
            "sh", "run_retrain_container.sh"
        ],

        pvolumes={"src/data": trainop.pvolume},
    ).add_pod_label("app", "enas-application")

def ServeOp(retrainop):
    return dsl.ContainerOp(
        name="serve pipeline",
        image="xx.xxx.xx.xx:5000/phm:0.1-enas-serve",
        command = [
            "sh", "run_serve_container.sh"
        ],
        pvolumes={"src/data": retrainop.pvolume},
    ).add_pod_label("app", "enas-application")

def VolumnOp():
    return dsl.PipelineVolume(
        pvc="phm-volume"
    )

@dsl.pipeline(
    name='enas_pipeline',
    description='Probabilistic inference with graph neural network'
)
def enas_pipeline(
):
    print('enas_pipeline')

    vop = VolumnOp()

    dsl.get_pipeline_conf().set_image_pull_secrets([k8s_client.V1LocalObjectReference(name='regcredidc')])

    train_and_eval = TrainOp(vop)

    retrain = ReTrainOp(train_and_eval)
    retrain.after(train_and_eval)

    serve = ServeOp(retrain)
    serve.after(train_and_eval)

if __name__ == '__main__':
    import kfp.compiler as compiler
    # compiler.Compiler().compile(enas_pipeline, __file__ + '.tar.gz')
    compiler.Compiler().compile(enas_pipeline, __file__ + '.yaml')

결과

학습 파라미터
- 데이터: PTB

python main.py --network_type rnn --dataset ptb --controller_optim adam --controller_lr 0.00035 --controller_max_step=10 --controller_hid=32 --shared_max_step=10 --shared_hid=32 --shared_embed=32 --shared_optim sgd --shared_lr 20.0 --entropy_coeff 0.0001 --num_blocks=4 --max_epoch=10 --derive_num_sample=5

모델 학습 결과
- controller, shared weight parameter.pth 파일
- best performance DAG 를 담은 json 파일

{"-1": [[0, "identity"]], "-2": [[0, "identity"]], "0": [[1, "identity"], [2, "sigmoid"]], "1": [[3, "ReLU"]], "2": [[4, "avg"]], "3": [[4, "avg"]], "4": [[5, "h[t]"]]}

bentoML

'1. Engineering > Kubeflow' 카테고리의 다른 글

[Kubeflow] 파이프라인 실행 (1)	2023.04.23
[Kubeflow] 윈도우11에서 Minikube로 kubeflow 설치하기 (0)	2023.03.20
[Kubeflow] Kubeflow 설치 (0)	2022.10.20
[Kubeflow] GNN 파이프라인 (0)	2022.10.20

Mokssi's repository

[Kubeflow] ENAS RNN pipeline

소개

모델

파이프라인

결과

'1. Engineering > Kubeflow' 카테고리의 다른 글

+ Recent posts

티스토리툴바