Open Access System for Information Sharing

Login Library

Department of Computer Science & Engineering (컴퓨터공학과) 3. Theses_Ph.D.

Thesis

Cited 0 time in webofscience

webofscience

Cited 0 time in scopus

scopus

Metadata Downloads

Full metadata record

Files in This Item:: There are no files associated with this item.

DC Field	Value	Language
dc.contributor.author	김병주	-
dc.date.accessioned	2022-03-29T02:52:20Z	-
dc.date.available	2022-03-29T02:52:20Z	-
dc.date.issued	2021	-
dc.identifier.other	OAK-2015-08309	-
dc.identifier.uri	http://postech.dcollection.net/common/orgView/200000368231	ko_KR
dc.identifier.uri	https://oasis.postech.ac.kr/handle/2014.oak/111114	-
dc.description	Doctor	-
dc.description.abstract	본 논문은 단일 기기에서 대용량의 데이터로부터 토픽 모델의 일종인 잠재 디리클레 할당(LDA) 모델을 학습하기 위해 디스크 기반 확장성 있는 LDA 학습 기법(BlockLDA)을 제시한다. 대용량의 데이터로부터 LDA 모델을 학습할 경우 데이터와 모델 모두 기기의 메모리에 로드될 수 없는 경우가 발생하기 때문에, 제안한 기법은 이를 고려하여 효율적인 학습이 이루어질 수 있게 하였다. 학습 과정에서 병목이 되는 디스크 입출력을 최소화하기 위해, 본 기법에서는 1) 블록 단위 학습을 위한 데이터 구조 2) 변화하는 희소도를 고려한 모델 축소 알고리즘 3) 블록 로딩 중 발생하는 페이지 폴트 최소화를 위한 지역 스케줄링 기법을 도입하였다. 또한 기존의 단일 기기에서의 다양한 멀티 코어 및 디스크 기반 병렬 LDA 학습 기법들과의 비교를 통해 제안한 기법의 우수한 확장성 및 효율성을 실험적으로 증명하였다. 본 연구에서 제시한 기법을 통해 일반 사용자들도 상용 기기에서 쉽게 대용량 데이터로부터 LDA 모델 학습을 수행할 수 있어 기계학습 기술의 상용화에 기여할 수 있다.	-
dc.description.abstract	Latent Dirichlet Allocation(LDA) is a popular topic model widely-used for analyzing text data. Recently, the size of gathered text data even in a machine is greatly increasing. However, executing LDA inference in a machine is limited by the data and model size due to memory bottleneck. A disk-based algorithm which has the ability to process large-scale data which do not fit into the memory and provides good scalability in a machine with limited memory resources can handle this challenge. This paper proposes BlockLDA, an efficient disk-based LDA inference algorithm which can efficiently infer an LDA model when both of the data and model do not fit into the memory. As the speed of disk I/O is much slower than that of memory access, minimizing the amount of disk I/O is the most crucial factor which determines the efficiency of disk-based algorithms. BlockLDA manages the data and model as a set of small blocks so that it can support efficient disk I/O as well as process the LDA inference in a block-wise manner. In addition, it utilizes advanced techniques which help to minimize the amount of disk I/O, including 1) a space reduction algorithm to dynamically manage the block-wise model considering its changing sparsity and 2) a local scheduling algorithm to carefully select the next data blocks so that the number of page faults is minimized. Our experimental results demonstrate that BlockLDA shows better scalability and efficiency than its disk-based and in-memory competitors under the memory-limited environment.	-
dc.language	eng	-
dc.publisher	포항공과대학교	-
dc.title	Disk-based Scalable Topic Modeling Algorithm	-
dc.title.alternative	디스크 기반 확장성 있는 토픽 모델링 알고리즘	-
dc.type	Thesis	-
dc.contributor.college	일반대학원 컴퓨터공학과	-
dc.date.degree	2021- 2	-

Show simple item record

qr_code

트윗하기

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Communities & Collection

Department of Computer Science & Engineering (컴퓨터공학과)

Views & Downloads

OAK

개인정보처리방침 Personal Information Protection Policy

library@postech.ac.kr Tel: 054-279-2548

Copyrights © by 2017 Pohang University of Science ad Technology All right reserved.

Browse

Login Library Help