ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Hugging Face์˜ ๋”ฐ๋ˆ๋”ฐ๋ˆํ•œ ์‹ ๊ธฐ๋Šฅ Tool, Agent (New Features of Hugging Face: tools and agents)
    BIG DATA & AI 2023. 5. 13. 15:54
    ๋ฐ˜์‘ํ˜•
    ๐ŸŽˆ๋ณธ ํฌ์ŠคํŒ…์€ Hugging Face์˜ ์ •๋ง์ •๋ง ๋”ฐ๋ˆ๋”ฐ๋ˆํ•œ ์‹ ๊ธฐ๋Šฅ์ธ Tool๊ณผ Agent๋ฅผ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
    ๋ฌด๋ ค ์ผ์ฃผ์ผ๋„ ์•ˆ ๋œ 23๋…„ 5์›” 10์ผ์— ๋ฆด๋ฆฌ์ฆˆํ–ˆ๋‹ค๋Š” ์‚ฌ์‹ค!

     

    Hugging Face๋ž€?

    ํ—ˆ๊น…ํŽ˜์ด์Šค๋Š” ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(NLP)๋ฅผ ์ „๋ฌธ์œผ๋กœ ํ•˜๋Š” ํšŒ์‚ฌ์ด๋ฉฐ, NLP๋ฅผ ์œ„ํ•œ ์˜คํ”ˆ์†Œ์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€ ๋„๊ตฌ ๊ฐœ๋ฐœ์— ๋งŽ์€ ๊ธฐ์—ฌ๋ฅผ ํ•˜๊ณ  ์žˆ๋‹ค. ํ—ˆ๊น…ํŽ˜์ด์Šค๋Š” ์–ธ์–ด ๋ฒˆ์—ญ, ํ…์ŠคํŠธ ๋ถ„๋ฅ˜, ๊ฐ์„ฑ ๋ถ„์„, ์งˆ๋ฌธ ๋‹ต๋ณ€ ๋“ฑ ๋‹ค์–‘ํ•œ NLP ์ž‘์—…์— ์‚ฌ์šฉ๋˜๋Š” ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๋ฉฐ, ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” ์ œํ’ˆ์€ PyTorch ์œ„์— ๊ตฌ์ถ•๋œ Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์ด๋‹ค.

    ํ—ˆ๊น…ํŽ˜์ด์Šค๋Š” Transformers ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์™ธ์—๋„ ํ† ํฌ๋‚˜์ด์ €, ๋ฐ์ดํ„ฐ์…‹, ํŒŒ์ดํ”„๋ผ์ธ ๋“ฑ NLP์— ์‚ฌ์šฉ๋˜๋Š” ๋‹ค์–‘ํ•œ ์˜คํ”ˆ์†Œ์Šค ๋„๊ตฌ๋ฅผ ๊ฐœ๋ฐœํ•˜๊ณ  ์žˆ๋‹ค. ๋˜ํ•œ ํ—ˆ๊น…ํŽ˜์ด์Šค ํ—ˆ๋ธŒ(Hugging Face Hub)๋ผ๋Š” ์˜จ๋ผ์ธ ํ”Œ๋žซํผ์„ ์šด์˜ํ•˜๋ฉฐ, ์ด๋Š” NLP ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ์…‹์„ ๊ณต์œ ํ•˜๊ณ  ํ˜‘์—…ํ•  ์ˆ˜ ์žˆ๋Š” ์ €์žฅ์†Œ ์—ญํ• ์„ ํ•œ๋‹ค.

    ์ „๋ฐ˜์ ์œผ๋กœ ํ—ˆ๊น…ํŽ˜์ด์Šค๋Š” ์˜คํ”ˆ์†Œ์Šค ์†Œํ”„ํŠธ์›จ์–ด์— ๋Œ€ํ•œ ๊ธฐ์—ฌ์™€ NLP ์—ฐ๊ตฌ ๋ฐ ๊ฐœ๋ฐœ์— ๋Œ€ํ•œ ํ˜์‹ ์ ์ธ ์ ‘๊ทผ์œผ๋กœ NLP ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ๋‹ค.

     

    Transformers can do anything

    ์ด๋Ÿฌํ•œ ํ—ˆ๊น…ํŽ˜์ด์Šค์—์„œ Transformer Agent์™€ Custom Tools๋ผ๋Š” ๊ธฐ๋Šฅ์ด ์ถ”๊ฐ€๋˜์—ˆ๋‹ค. ์—…๋ฐ์ดํŠธ ๋œ ์˜ˆ์ œ ์ฝ”๋“œ์˜ ์„œ๋‘์—๋Š” "Transformers can do anything"์ด๋ผ๋Š” ๋งˆ๋ฒ•์ ์ธ(?) ๋ฌธ๊ตฌ๋กœ ์†Œ๊ฐœ๋˜๊ณ  ์žˆ๋Š”๋ฐ, ํ•„์ž์™€ ํ•จ๊ป˜ ์‚ดํŽด๋ณด๋„๋ก ํ•˜์ž. ๐Ÿ˜

     


    Agent

    Agent์˜ ๊ตฌ์กฐ๋ฅผ ์‚ดํŽด๋ณด๋ฉด ์œ„์™€ ๊ฐ™๋‹ค. ํ•˜๊ณ ์ž ํ•˜๋Š” instruction์„ prompt ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด์„œ Agent๐Ÿ˜Ž์—๊ฒŒ ์ฃผ๋ฉด, interpreter๊ฐ€ prompt์— ๋งž๋Š” ์–ธ์–ด ๋ชจ๋ธ์„ ์ถ”์ฒœํ•ด ์ค€๋‹ค. ์˜ˆ์ œ์—์„œ๋Š” ์ด ์ผ๋ จ์˜ ๊ณผ์ •์„ ํ†ตํ•ด "A river flowing through a frozen forest" ๋ผ๋Š” image captioning ์˜ค๋””์˜ค๊ฐ€ ์ƒ์„ฑ๋œ ๋ชจ์Šต์„ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ๋‹ค.

    ์‚ฌ์‹ค ๋ชน์‹œ ๊ฐ„๋‹จํ•˜์ง€๋งŒ ์•„์ฃผ ํŽธ๋ฆฌํ•œ ๊ธฐ๋Šฅ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. Agent ์ด์ „์—๋Š” ๊ฐ feature๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” class๋ฅผ ๋ณ„๋„๋กœ ์ •์˜ํ•˜์—ฌ ์—ฐ๊ฒฐํ•ด ์ฃผ์–ด์•ผ๋งŒ ํ–ˆ๋Š”๋ฐ, ์ด์ œ๋Š” Agent๊ฐ€ ๊ท€์ฐฎ์€ ๋ฐ˜๋ณต์ ์ธ ๊ณผ์ •์„ ์•Œ์•„์„œ ์ž˜~ ํ•ด์ฃผ๋Š” ๊ฒƒ์ด๋‹ค. ๋น„์œ ๋ฅผ ํ•˜์ž๋ฉด ๊ณ ์˜ค๊ธ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ชจ์Œ์„ ์ฐจ๋ก€๋กœ ๊ฐ–๋‹ค ์ฃผ๋Š” AI ์˜ค๋งˆ์นด์„ธ ๊ฐ™๋‹ฌ๊นŒ.

     

    Using Image Generator

    ๋ณธ ์ฝ”๋“œ๋Š” ์ตœ์‹  ๋ฒ„์ „์ธ transformer v4.29.0์ด์ƒ์—์„œ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.
    ๋˜ํ•œ Jupyter notebook ํ™˜๊ฒฝ ์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•ฉ๋‹ˆ๋‹ค. VSCode์—์„œ๋Š” ๊ทธ๋ฆผ์ด ์ž˜ ์•ˆ๋‚˜์˜ต๋‹ˆ๋‹คใ… 

     

    ๊ฐ„๋‹จํ•˜๊ฒŒ image generator๋ฅผ ๋งŒ๋“ค์–ด ๋ณด์ž. (์ด์ œ๋Š” Agent ๋•๋ถ„์— ๋งŒ๋“ ๋‹ค๊ธฐ๋ณด๋‹ค๋Š” ๋‹จ์ˆœ ํ˜ธ์ถœ์— ๊ฐ€๊น๋‹ค.)

    # agent_name: "StarCoder (HF Token)"
    
    from transformers.tools import HfAgent
    
    agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
    print("StarCoder is initialized ๐Ÿ’ช")
    boat = agent.run("Generate an image of a boat in the water")
    boat

    boat๋ฅผ ์ถœ๋ ฅํ•ด ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜์˜จ๋‹ค. ์ˆœ์‹๊ฐ„์— ๋ฌผ ์œ„์˜ ๋ณดํŠธ ์‚ฌ์ง„์ด ์™„์„ฑ๋˜์—ˆ๋‹ค :)
    ์ •๋ฆฌํ•˜์ž๋ฉด,, ์ด์ œ๋Š” ๋‹จ 2์ค„๋งŒ ์งœ๋ฉด ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋ชจ๋ธ์„ ์“ธ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค.. ใ…‹ใ…‹ใ…‹ใ…‹ใ…‹

    ๋ฐ˜๋Œ€๋กœ Agent์—๊ฒŒ ์ƒ์„ฑํ•œ ์ด๋ฏธ์ง€๋ฅผ ์„ค๋ช…ํ•ด๋ณด๋ผ๊ณ  ํ•ด ๋ณด์ž. (image captioning)

    caption = agent.run("Can you caption the `boat_image`?", boat_image=boat)
    caption

     

    Reading out the Web Page (Plaing Audio)

    ๊ทธ ๋‹ค์Œ์œผ๋กœ ์†Œ๊ฐœํ•˜๋Š” Agent์˜ ์žฅ๊ธฐ๋Š” web page๋ฅผ ์ฝ์–ด์ฃผ๊ธฐ. beautifulsoup์œผ๋กœ ํฌ๋กค๋งํ•ด์„œ audio๋กœ ์žฌ์ƒํ•ด์ฃผ๋Š”๋ฐ ์ƒ๊ฐ๋ณด๋‹ค ๊ณ ๋„ํ™”๋œ(?) ๊ธฐ๋ณธ ๊ธฐ๋Šฅ์— ๊นœ์ง ๋†€๋ž๋‹ค. ์ด๊ฒƒ๋„ ๋‹จ 2์ค„!

    audio = agent.run("Read out loud the summary of http://hf.co")
    play_audio(audio)

    ํ˜ธ๊ธฐ์‹ฌ์ด ์ƒ๊ฒจ ํ•œ๊ตญ ์‚ฌ์ดํŠธ ๋„ค์ด๋ฒ„๋„ ์‹œ์ผœ ๋ณด์•˜๋Š”๋ฐ, ํฌ๋กค๋ง์€ ์ž˜ ๋˜๋‚˜ ์ฝ์–ด์ฃผ๋Š” ๋ชจ๋ธ์€ ์ž˜ ๋™์ž‘ํ•˜์ง€ ์•Š๋”๋ผ.

     

    Chat Mode

    ์˜ˆ์ œ์ฝ”๋“œ๋Š” OpenAI๋กœ ๋Œ๋ฆฐ ๊ฒƒ ๊ฐ™์€๋ฐ ํ•„์ž๋Š” OpenAI ๊ณ„์ •์ด ์—†์–ด์„œ HFAgent๋กœ ๋Œ๋ ธ๋”๋‹ˆ ์—๋Ÿฌ๊ฐ€ ์ž˜ ๋Œ์•„๊ฐ€์ง€ ์•Š๋Š”๋‹ค. ๐Ÿ˜… ์ถ”ํ›„์— ๋ฒ„๊ทธ๊ฐ€ ๊ณ ์ณ์ง€๊ธธ ๋ฐ”๋ž˜ ๋ณธ๋‹ค.

     


    Tools

    Hugging Face์˜ tools๋Š” ์•„๋ž˜์™€ ๊ฐ™์€ ๊ธฐ๋Šฅ๋“ค์„ ์ œ๊ณตํ•œ๋‹ค.

    These tools are the following:

    • Document question answering: given a document (such as a PDF) in image format, answer a question on this document (Donut)
    • Text question answering: given a long text and a question, answer the question in the text (Flan-T5)
    • Unconditional image captioning: Caption the image! (BLIP)
    • Image question answering: given an image, answer a question on this image (VILT)
    • Image segmentation: given an image and a prompt, output the segmentation mask of that prompt (CLIPSeg)
    • Speech to text: given an audio recording of a person talking, transcribe the speech into text (Whisper)
    • Text to speech: convert text to speech (SpeechT5)
    • Zero-shot text classification: given a text and a list of labels, identify to which label the text corresponds the most (BART)
    • Text summarization: summarize a long text in one or a few sentences (BART)
    • Translation: translate the text into a given language (NLLB)

     

    We also support the following community-based tools:

    • Text downloader: to download a text from a web URL
    • Text to image: generate an image according to a prompt, leveraging stable diffusion
    • Image transformation: transforms an image

     

    Adding New Tools

    ๋„ˆ๋ฌด๋‚˜ ํŽธ๋ฆฌํ•œ tool๋“ค์ด์ง€๋งŒ, ์ฃผ์–ด์ง„ Tool๋งŒ ์“ฐ๊ธฐ์—๋Š” ํ•ญ์ƒ ๋ถ€์กฑํ•˜๋‹ค. ์ปค์Šคํ„ฐ๋งˆ์ด์ง•์ด ํ•„์š”ํ•œ ์‹œ์ ์ด๋‹ค. ๋‚ด๊ฐ€ ๋งŒ๋“  ํ•จ์ˆ˜๋‚˜ ๋ชจ๋ธ์„ ์–ธ์–ด๋ชจ๋ธํ•œํ…Œ ์ฃผ์ž…ํ•ด์„œ ์“ฐ๊ณ  ์‹ถ์œผ๋ฉด Custom Tool์„ ๋งŒ๋“ค์–ด์„œ Agent์— ์ถ”๊ฐ€ํ•ด์ฃผ๋ฉด ๋œ๋‹ค. ๐Ÿ˜€๐Ÿ˜€

    ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ท€์—ฌ์šด ๊ณ ์–‘์ด ์ŠคํŠธ๋ฆผ ํŽ˜์ด์ง€๊ฐ€ ์žˆ๋‹ค. ์ด ์ด๋ฏธ์ง€๋Š” ์ŠคํŠธ๋ฆผ ํ˜•ํƒœ๋กœ ๋งค ์ดˆ๋งˆ๋‹ค ์ƒˆ๋กœ์šด ์ด๋ฏธ์ง€๋กœ ๊ฐฑ์‹ ๋˜๋Š”๋ฐ, ์ด ์ด๋ฏธ์ง€๋ฅผ ๋ชจ์œผ๋Š” fetcher๋ฅผ ๋งŒ๋“ค์–ด์„œ custom tool๋กœ ์ถ”๊ฐ€ํ•ด ๋ณด์ž.

    https://cataas.com/cat

    class ์„ ์–ธ์—๋Š” ๋ช‡ ๊ฐ€์ง€ ๋ฃฐ์ด ์žˆ๋‹ค. Attribute์— ๋Œ€ํ•œ ์ด๋ฆ„, ์„ค๋ช…, ๊ทธ๋ฆฌ๊ณ  ์ž…์ถœ๋ ฅ์„ ๋‹ด๊ณ  ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๋˜ํ•œ call์ด๋ผ๋Š” ํ•จ์ˆ˜๋Š” inference code๋กœ์จ ๊ผญ ์ถ”๊ฐ€ํ•ด์ฃผ์–ด์•ผ ํ•œ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    This class has a few needs:

    • An attribute name, which corresponds to the name of the tool itself. To be in tune with other tools which have a performative name, we'll name it text-download-counter.
    • An attribute description, which will be used to populate the prompt of the agent.
    • inputs and outputs attributes. Defining this will help the python interpreter make educated choices about types, and will allow for a gradio-demo to be spawned when we push our tool to the Hub. They're both a list of expected values, which can be text, image, or audio.
    • call method which contains the inference code. This is the code we've played with above!

     

    ์œ„์˜ ์กฐ๊ฑด๋“ค์„ ์ถฉ์กฑํ•˜๋Š” CatImageFetcher class๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ž‘์„ฑ๋  ์ˆ˜ ์žˆ๋‹ค.
    Here’s what our class looks like now:

    from transformers import Tool
    from huggingface_hub import list_models
    
    
    class CatImageFetcher(Tool):
        name = "cat_fetcher"
        description = ("This is a tool that fetches an actual image of a cat online. It takes no input, and returns the image of a cat.")
    
        inputs = []
        outputs = ["text"]
    
        def __call__(self):
            return Image.open(requests.get('https://cataas.com/cat', stream=True).raw).resize((256, 256))

    CatImageFetcher๋ฅผ ๋‹จ์ˆœ ์‹คํ–‰ํ•˜๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค๋ƒฅ. ๐Ÿ˜บ

    ๋šฑ๋ƒฅ ๋‹น์ฒจ

    ์ž˜ ๋งŒ๋“ค์–ด์ง„ ๊ฒƒ์„ ํ™•์ธํ–ˆ์œผ๋‹ˆ, ์ด์ œ custom tool๋กœ Agent์— ์ถ”๊ฐ€ํ•ด ๋ณด์ž. ๋‹จ 1์ค„!

    from transformers.tools import HfAgent
    
    agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=[tool])

    ์ด์ œ agent์—๊ฒŒ cat image๋ฅผ ๋ณด์—ฌ๋‹ฌ๋ผ๊ณ  prompt๋ฅผ ๋‚ ๋ฆฌ๋ฉด, ์•Œ์ž˜๋”ฑ๊น”์„ผ์œผ๋กœ cat_fetcher()๋ฅผ ์ด์šฉํ•˜์—ฌ ์ฐพ์•„ ์˜จ๋‹ค. ์ •๋ง ์‹ ๊ธฐํ•˜๊ณ ๋„ ๋ฉ‹์ง€๋‹ค ๐Ÿ˜Ž๐Ÿ˜Ž

    ๋ƒฅ: ? ๋‚ด๊ฐ€ ์™œ ์—ฌ๊ธฐ์— ์žˆ์ง€?

    ์•„๋ž˜์™€ ๊ฐ™์ด ๊ธฐ์กด ์ œ๊ณต ๊ธฐ๋Šฅ๊ณผ ํ˜ผํ•ฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

    AI๋Š” ๊ธฐ์ˆ ์ ์œผ๋กœ์˜ ๋ฐœ์ „๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ ํŽธ์˜์„ฑ๋„ ๊ฐ€์†์„ฑ์žฅํ•˜๊ณ  ์žˆ๋Š” ๋“ฏํ•˜๋‹ค. ์‚ฌ์‹ค ์š”์ƒˆ๋Š” ์ถ”์ƒํ™”๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์ด ๋˜์–ด์„œ ์•Œ๋ฉด ์•Œ์ˆ˜๋ก ๋” ๋ชจ๋ฅด๊ฒ ๋‹ค (ambiguous) . ํ•„์ž๋Š” researcher์˜ ๊ธธ๋ณด๋‹ค๋Š” engineer์˜ ๊ธธ์ด ๋” ๋งž๋Š” ๋“ฏํ•˜๋‹ค. ์—ด์‹ฌํžˆ ๋”ฐ๋ผ์žก์•„ ๋ณด์ž ๐Ÿ˜Ž

     

    References

    ๋ณธ ์ฝ”๋“œ์˜ ์ „๋ฌธ์„ github์— ์—…๋กœ๋“œ ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณธ ์ฝ”๋“œ๋Š” transformers์—์„œ ๋ฐœ์ทŒํ•˜์—ฌ ๋ณ€ํ˜•ํ•œ ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.

    https://github.com/yerimJu/transformers/blob/main/Transformers_can_do_anything.ipynb

     

    GitHub - yerimJu/transformers: transformers examples

    transformers examples. Contribute to yerimJu/transformers development by creating an account on GitHub.

    github.com

    https://colab.research.google.com/drive/1c7MHD-T1forUPGcC_jlwsIptOzpG3hSj

     

    Transformers can do anything

    Colaboratory notebook

    colab.research.google.com

     

    ๋ฐ˜์‘ํ˜•

    ๋Œ“๊ธ€

Written by Emily.