Skip to main content

Efficient Memory Management for Large Language Model Serving with PagedAttention

June 20, 2023 · 16 min read

AI Engineer

대형 언어 모델(LLM) 성능 향상을 위한 저메모리 솔루션인 vLLM에 대한 심층적인 설명을 제공합니다. 특히 PagedAttention 알고리즘을 활용하여 메모리 관리의 비효율성을 극복하고 자원을 최적화하며, 처리량을 2-4배 향상시킬 수 있는 방법을 알아볼 수 있습니다. 이를 통해 LLM 서비스의 운영 비용을 줄이고 효율성을 높이는 방법을 배울 수 있으며, 최신 기술적 접근 방식에 대한 통찰을 제공합니다.

효율적인 메모리 관리는 대형 언어 모델의 성능에 큰 영향을 미친다.

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

June 1, 2023 · 11 min read

AI Engineer

Abstractact

in terms of memory size and bandwidth, pose significant deployment challenges.

Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages

March 6, 2023 · 9 min read

AI Engineer

Abstract and Introduction

Toolformer: Language Models Can Teach Themselves to Use Tools

February 9, 2023 · 4 min read

AI Engineer

Abstract

LM은 적은 수의 예제와 텍스트 지침을 이용해서 몇 태스크에 뛰어난 성과였다.

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

January 1, 2023 · 7 min read

AI Engineer

Robust Speech Recognition via Large-Scale Weak Supervision (Whisper)

December 6, 2022 · 7 min read

AI Engineer

Amount of dataset

양과 질을 고려하여 680,000시간 데이터셋을 사용

LaMDA: Language Models for Dialog Applications

January 20, 2022 · 5 min read

AI Engineer

LaMDA is a family of Transformer- based neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text.
The first challenge, safety, involves ensuring that the model’s responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias.

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for Everyone

December 6, 2021 · 8 min read

AI Engineer

Abstract

YOURTTS는 multilingual approach to the task of zero-shot multi-speaker TTS. 이 모델은 VITS[Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech] 기반으로 zero-shot multi-speaker와 multilingual 학습을 위해서 몇몇 수정을 거친 모델이다. 그래서 zero-shot multi-speaker TTS에 sota를 달성했다. 그리도 VCTK 데이터셋에서 zero-shot voice convention 에서도 SOTA를 달성했다. 그리고 single-speaker dataset에서도 promising results이다. 또한 1분 미만의 데이터에서도 voice similarity와 합리적인 퀄리티를 보였다.

Welcome

August 26, 2021 · One min read

Sébastien Lorber

Docusaurus maintainer

Ex-Meta Staff Engineer, Co-founder GreatFrontEnd

Docusaurus blogging features are powered by the blog plugin.

Here are a few tips you might find useful.

MDX Blog Post

August 1, 2021 · One min read

Sébastien Lorber

Docusaurus maintainer

Blog posts support Docusaurus Markdown features, such as MDX.

tip

Use the power of React to create interactive blog posts.

VITS: Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

June 11, 2021 · 9 min read

AI Engineer

Variational Inference with adversarial learning for end-to-end Text-to-Speech (VITS).

Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech

DALL-E: Creating Images from Text

January 5, 2021 · One min read

AI Engineer

DALL-E 관련 내용입니다.

Few-Shot Question Answering by Pretraining Span Selection (Splinter)

January 1, 2021 · 2 min read

AI Engineer

We explore the more realistic few-shot setting, where only a few hundred training examples are available, and observe that standard models perform poorly, highlighting the discrepancy between current pretraining objectives and question answering.
We propose a new pretraining scheme tailored for question answering: recurring span selection. Given a passage with multiple sets of recurring spans, we mask in each set all recurring spans but one, and ask the model to select the correct span in the passage for each masked span.

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

March 23, 2020 · 2 min read

AI Engineer

ELECTRA : PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS

Masked language modeling(MLM)들은 일반적으로 많은 양의 계산을 필요로한다. 그에 대한 대안으로 이 논문은 replaced token detection이라고도 하는 pre-training을 효율적으로 하는 것에 의의를 둔다. 입력을 masking 하는 대신 작은 generator 모델을 통해 생성된 토큰으로 대체한다. 그래서 corrupted 토큰들의 원본을 예측하는 대신 이 토큰이 생성된 토큰인지 아닌지를 분별한다.
그래서 BERT와 똑같은 모델 사이즈, 데이터, 학습양으로 더 뛰어난 성능을 보여지고, RoBERTa나 XLNet 보다 1/4의 계산량으로 비슷한 결과를 보여주고 같은 계산량이면 더 능가한다.

Long Blog Post

May 29, 2019 · 3 min read

Ex-Meta Staff Engineer, Co-founder GreatFrontEnd

This is the summary of a very long blog post,

Use a  comment to limit blog post size in the list view.

First Blog Post

May 28, 2019 · One min read

Sébastien Lorber

Docusaurus maintainer

Ex-Meta Staff Engineer, Co-founder GreatFrontEnd

Lorem ipsum dolor sit amet...

MusicVAE: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music

March 27, 2018 · One min read

AI Engineer

VAE(Variational Auto-Encoder)는 seqeunces를 사용하는 long-term 구조에 사용하기어렵다.
그래서 outputs의 embeddings을 각 subsequence 에 독립적으로 생성하기 위하여 사용하는 hierarchical decoder 구조를 제안한다.

Transformer and BERT

December 6, 2017 · 11 min read

AI Engineer

2018년 당시에 [뉴옥 타임지]에서 Oren Etzioni, chief executive of the Allen Institute for Artificial Intelligence가 말하길,
기계가 아직 인간의 보통 감각을 표현할 수는 없지만, Bert는 폭발적인 발전의 순간이라고 했습니다. 이 Bert 모델에 기초가 된 [Transformer]는 어텐션 매커니즘을 사용하여 Encoder-Decoder로 구성되는 구조를 보려고 합니다.

End-to-End Neural Coreference Resolution

July 26, 2017 · 3 min read

AI Engineer

Coreference Resolution

Coreferece를 찾는 NLP Task 중 하나로 coreference는 문장 속에서 Entity와 같은 의미로 언급(mention)된 span을 찾는 것을 목적.

Neural Machine Translation of Rare Words with Subword Units

August 31, 2016 · 4 min read

AI Engineer

Neural Machine Translation of Rare Words with Subword Units

데이터 압축으로 쓰이던 bpe를 자연어에 쓴 논문이다. 단어보다 작은 subword unit을 사용하여 음운론적이고 형태학적으로 번역함으로써, open-vocabulary NMT모델을 소개한다.