Transformers ONNX#

transforemrs์—์„œ ๋ชจ๋ธ๋“ค์„ ONNX๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ#

Open Neural Network Exchange(ONNX)์€ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ๋“ค์„ builtํ•˜๊ธฐ ์œ„ํ•œ ecosystem, ์ฆ‰ ๋‹ค์–‘ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๊ณตํ†ต๋œ ์„ธ์…˜์„ ํ†ตํ•ด ์‹คํ–‰ํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค.
๊ทธ๋Ÿฐ๋ฐ ์ค‘์š”ํ•œ๋ถ€๋ถ„์ด ์žˆ๋‹ค. ๋ฐ”๋กœ production helps increase the speed of innovation in the AI community ํผํฌ๋จผ์Šค๋ฅผ ํ–ฅ์ƒ์‹œ์ผœ์ค€๋‹ค๋Š” ์ ์ด๋‹ค.
๊ทธ๋ž˜์„œ ์ด๋ฒˆ ์žฅ์—์„œ๋Š” transformers์—์„œ ONNX๋ฅผ ์ ์šฉํ•œ ์‚ฝ์งˆ๊ธฐ๋ฅผ ์ž‘์„ฑํ•œ๋‹ค.

model#

tansformers์—์„œ๋Š” ๊ฐ์ข… ๋…ผ๋ฌธ์—์„œ ์–ธ๊ธ‰ํ•œ bpe ๊ฐ™์€ ํ† ํฌ๋‚˜์ด์ €์™€, ์ž…๋ ฅ๊ฐ’, ๋ ˆ์ด์ €๋ฅผ ๋™์ผํ•˜๊ฒŒ ๋งŒ๋“ค์–ด๋†จ๋‹ค.
๊ทธ๋ž˜์„œ ONNX๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ์ž…๋ ฅ์ด ๋Œ€์ฒด๋กœ ๋‹ค๋ฅด๋‹ค.
์ด ์ž…๋ ฅ๊ฐ’์ด ๋™์ผํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜๊ฑฐ๋‚˜ shape_inference.infer_shapes์˜ ํ•จ์ˆ˜๋กœ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ ์šฉ๋งŒ ํ•œ๋‹ค๋ฉด ์ข‹์œผ๋ จ๋งŒ ์•„์‰ฝ๊ฒŒ๋„ ์•„์ง ์™„๋ฒฝํ•˜๊ฒŒ ์ ์šฉ๋˜์ง€ ์•Š๋Š”๋‹ค.
๊ทธ๋ž˜์„œ ๊ฐ ๋ชจ๋ธ์— ๋Œ€ํ•œ ONNX ๋ณ€ํ™˜์„ ์‹œ๋„ํ•ด๋ณธ๋‹ค.

ํ”„๋กœ์„ธ์Šค๋Š” ๊ฐ„๋‹จํ•˜๊ฒŒ onnx๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ๊ณ  ๋ณ€ํ™˜ํ•ด์ค€ onnx๋ชจ๋ธ์„ onnx_runtime์œผ๋กœ ์„ธ์…˜์„ ์—ด์–ด์„œ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค.

๋จผ์ € Bert๋Š” ํŠœํ† ๋ฆฌ์–ผ์— ์ž˜ ์„ค๋ช…ํ•ด์คฌ๋‹ค.
https://github.com/huggingface/transformers/blob/master/notebooks/04-onnx-export.ipynb

from transformers.convert_graph_to_onnx import convert

# Handles all the above steps for you
convert(framework="pt", model="bert-base-cased", output=Path("onnx/bert-base-cased.onnx"), opset=11)

๊ทธ๋ฆฌ๊ณ  seq2seq ๋ชจ๋ธ์ค‘ Conditional Generation์œผ๋กœ ์œ ์šฉํ•˜๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” T5๋ชจ๋ธ๋„ library๋กœ ์ž˜ ๊ตฌํ˜„ํ•ด ๋†“์•˜๋”๋ผ.
https://github.com/Ki6an/fastT5

์ด fastT5๋Š” encoder์™€ decoder ๊ทธ๋ฆฌ๊ณ  lm_head๋กœ ๊ตฌ์„ฑ๋˜๋Š”๋ฐ, lm_head๋Š” decoder๊ฐ€ lm_head๋กœ initํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•„์š”ํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ์ด 3๊ฐœ์˜ ๋ชจ๋ธ๋กœ ๋‚˜๋‰˜์–ด์„œ ์ €์žฅ์ด ๋œ๋‹ค

from fastT5 import export_and_get_onnx_model
from transformers import AutoTokenizer

model_name = 't5-small'
model = export_and_get_onnx_model(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name)
t_input = "translate English to French: The universe is a dark forest."
token = tokenizer(t_input, return_tensors='pt')

tokens = model.generate(input_ids=token['input_ids'],
               attention_mask=token['attention_mask'],
               num_beams=2)

output = tokenizer.decode(tokens.squeeze(), skip_special_tokens=True)
print(output)

์ถ”๊ฐ€์ ์œผ๋กœ fastT5์—๋„ wrapํ•œ quantization์ด ์žˆ๋‹ค. onnx์—์„œ quantize ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ quantization์„ ํ•  ์ˆ˜์žˆ๋Š”๋ฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ 1ํผ์„ผํŠธ ์ •๋„์˜ ์ •ํ™•๋„๋ฅผ ๋–จ์–ด๋œจ๋ฆฌ์ง€๋งŒ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ์ ˆ๋ฐ˜์—์„œ 2/3 ์ •๋„๋กœ ์ค„์—ฌ์ค˜์„œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์กฐ๊ธˆ์ด๋ผ๋„ ์•„๋‚„ ์ˆ˜ ์žˆ๋‹ค.

์ถ”๊ฐ€์ ์œผ๋กœ(2) huggingface 4.6์ด์ƒ์—์„œ ๋Œ๋ ค์•ผ ๋œ๋‹ค.
๊ด€๋ จ์ด์Šˆ๋Š” ์ด๊ณณ https://github.com/huggingface/transformers/pull/10651

๊ทธ๋ฆฌ๊ณ  Xlnet์—์„œ๋Š”? ์ž…๋ ฅ์˜ encoder์ค‘ 1๊ฐœ๋ฅผ ๋นผ์•ผ๋˜๋Š”๋ฐโ€ฆ.. ๊ทธ๊ฒƒ์€ ๋ฐ”๋กœ