Skip to content

How to identify and extract Q&A pairs from the raw documents? #6

@AlphaSue

Description

@AlphaSue

Great works, I'm wondering how to identify QA pairs from web data? is there any rule-based filter? Can you guys open source this part?

the origin paper mentioned:
"Q&A Extraction Question-and-answer data is inherently well-structured and embod-
ies a concentrated form of knowledge, making it valuable for problem-solving bench-
marks (Maini et al., 2024). Recent work reveal that these data can be found in pre-training
data with massive quantity (Yue et al., 2024). We thus integrate and further verify this in
MegaMath. Our pipeline contains two steps: (1) identify and extract Q&A pairs from the
raw documents; (2) refine the Q&A to make up or improve the intermediate reasoning steps."

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions