This is a personal space where I review papers and organize related documents.
이 곳은 개인적으로 논문을 리뷰하고 관련 문헌들을 정리하는 공간입니다.
08 Aug 2024 » Diary - LLM distillation 증거 찾기
08 Aug 2024 » Diary - LLM distillation envidence!!
03 Jul 2024 » Survey - Multimodal LLM fine-tuning dataset
02 Jul 2024 » Paper survey - Multimodal LLM
27 May 2024 » Anthropic post - Mapping the Mind of a Large Language Model
22 May 2024 » Toy project - Daily paper bot 만들기
24 Mar 2024 » Direct Alignment from Preferences - Part 03. Online DAP
25 Feb 2024 » Direct Alignment from Preferences - Part 02. DAP
22 Feb 2024 » Diary - SSH keygen 이슈
19 Feb 2024 » Direct Alignment from Preferences - Part 01. RLHF
01 Jan 2024 » Diary - 2023년 회고
21 Dec 2023 » MLX: Apple silicon 용 Machine Learning 프레임워크 - 04.LLM inference example
17 Dec 2023 » MLX: Apple silicon 용 Machine Learning 프레임워크 - 03.Multi-Layer Perceptron example
16 Dec 2023 » MLX: Apple silicon 용 Machine Learning 프레임워크 - 02.Regression example
15 Dec 2023 » MLX: Apple silicon 용 Machine Learning 프레임워크 - 01.Quick-start
03 Oct 2023 » Diary - Mac OS (Apple silicon M1) 에서 Ubuntu 사용하기
06 May 2023 » Project review - Knowledge distillation from powerful LLM, Alpaca and Koala
26 Apr 2023 » Diary - Ignore value in Cross Entropy function of PyTorch
22 Mar 2023 » OpenAI 뿌수기 - ChatGPT prompt design
21 Feb 2023 » MLOps study - Raviraja Week 6: CI/CD - GitHub Actions
02 Jan 2023 » Diary - 2022년 회고
29 Nov 2022 » Diary - Netron
22 Nov 2022 » Diary - Gym 0.26.0 update (truncted, no seed)
15 Nov 2022 » Diary - Dockerfile 한국어 설정
24 Oct 2022 » MLOps study - Raviraja Week 5: Docker
22 Oct 2022 » MLOps study - Raviraja Week 4: ONNX
17 Oct 2022 » MLOps study - Raviraja Week 3: DVC
13 Oct 2022 » MLOps study - Raviraja Week 2: Hydra
13 Oct 2022 » MLOps study - Raviraja Week 1: W&B
11 Oct 2022 » MLOps study - Raviraja Week 0: Pytorch Lightning
16 Aug 2022 » Diary - Sqaushed Gaussian policy for SAC
17 Jul 2022 » Diary - Network architecture for RL
25 Apr 2022 » Diary - Curious about GPU memory
09 Apr 2022 » Diary - Catastropic performance drop of off-policy RL methods
19 Mar 2022 » Diary - Frequent mistake for using torch.distributions.Normal
06 Mar 2022 » Paper review - Diff-DAC: Distributed Actor-Critic for Average Multitask Deep Reinforcement Learning
11 Feb 2022 » Diary - How to solve discrete-SAC loss explosion problem?
30 Jan 2022 » Paper review - Skill Preferences: Learning to Extract and Execute Robotic Skills from Human Feedback
29 Jan 2022 » Diary - How to use proper tense for writing academic paper?
23 Jan 2022 » Paper review - Interactive teaching strategies for agent training
17 Jan 2022 » Paper review - Learning latent representations to influence multi-agent interaction
10 Jan 2022 » Paper review - Learning trajectory preferences for manipulators via iterative improvement
01 Jan 2022 » Paper review - A survey of preference-based reinforcement learning methods