百度360必应搜狗淘宝本站头条
当前位置:网站首页 > 编程网 > 正文

聊聊Spring AI Alibaba的YuQueDocumentReader

yuyutoo 2025-04-30 21:00 18 浏览 0 评论

本文主要研究一下Spring AI Alibaba的YuQueDocumentReader

YuQueDocumentReader

community/document-readers/spring-ai-alibaba-starter-document-reader-yuque/src/main/java/com/alibaba/cloud/ai/reader/yuque/YuQueDocumentReader.java

public class YuQueDocumentReader implements DocumentReader {

  private final DocumentParser parser;

  private final YuQueResource yuQueResource;

  public YuQueDocumentReader(YuQueResource yuQueResource, DocumentParser parser) {
    this.yuQueResource = yuQueResource;
    this.parser = parser;
  }

  @Override
  public List<Document> get() {
    try {
      List<Document> documents = parser.parse(yuQueResource.getInputStream());
      String source = yuQueResource.getResourcePath();

      for (Document doc : documents) {
        doc.getMetadata().put(YuQueResource.SOURCE, source);
      }

      return documents;
    }
    catch (IOException ioException) {
      throw new RuntimeException("Failed to load document from yuque: {}", ioException);
    }
  }

}

YuQueDocumentReader构造器要求输入YuQueResource、DocumentParser,其get方法通过DocumentParser解析,最后在其metadata追加一个SOURCE

YuQueResource

community/document-readers/spring-ai-alibaba-starter-document-reader-yuque/src/main/java/com/alibaba/cloud/ai/reader/yuque/YuQueResource.java

public class YuQueResource implements Resource {

  private static final String BASE_URL = "https://www.yuque.com";

  private static final String INFO_PATH = "/api/v2/hello";

  private static final String DOC_DETAIL_PATH = "/api/v2/repos/%s/%s/docs/%s";

  public static final String SOURCE = "source";

  public static final String SUPPORT_TYPE = "Doc";

  private final HttpClient httpClient;

  private final InputStream inputStream;

  private final URI uri;

  private final String resourcePath;

  private String groupLogin;

  private String bookSlug;

  private String id;

  public YuQueResource(String yuQueToken, String resourcePath) {

    this.resourcePath = resourcePath;

    this.httpClient = HttpClient.newBuilder().version(HttpClient.Version.HTTP_2).build();

    judgePathRule(resourcePath);
    judgeToken(yuQueToken);

    URI baseUri = URI.create(BASE_URL + DOC_DETAIL_PATH.formatted(groupLogin, bookSlug, id));

    HttpRequest httpRequest = HttpRequest.newBuilder()
      .header("X-Auth-Token", yuQueToken)
      .uri(baseUri)
      .GET()
      .build();

    try {
      HttpResponse<String> response = this.httpClient.send(httpRequest, HttpResponse.BodyHandlers.ofString());
      String body = response.body();
      // Parse the JSON response using Jackson
      ObjectMapper objectMapper = new ObjectMapper();
      JsonNode jsonObject = objectMapper.readTree(body);
      JsonNode dataObject = jsonObject.get("data");

      if (dataObject == null || !dataObject.isObject()) {
        throw new RuntimeException("Invalid response format: 'data' is not an object");
      }

      if (!Objects.equals(dataObject.get("type").asText(), SUPPORT_TYPE)) {
        throw new RuntimeException("Unsupported resource type, only support " + SUPPORT_TYPE);
      }

      inputStream = new ByteArrayInputStream(dataObject.get("body_html").asText().getBytes());
      uri = URI.create(resourcePath);

    }
    catch (Exception e) {
      throw new RuntimeException(e);
    }
  }

  /**
   * Judge resource path rule Official online doc
   * https://www.yuque.com/yuque/developer/openapi
   * @param resourcePath
   */
  private void judgePathRule(String resourcePath) {

    // Determine if the path conforms to this format: https://xx.xxx.com/aa/bb/cc
    String regex = "^https://[a-zA-Z0-9.-]+/([a-zA-Z0-9.-]+)/([a-zA-Z0-9.-]+)/([a-zA-Z0-9.-]+)#34;;

    Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(resourcePath);
    Assert.isTrue(matcher.matches(), "Invalid resource path");

    // Extract the captured groups
    this.groupLogin = matcher.group(1);
    this.bookSlug = matcher.group(2);
    this.id = matcher.group(3);
    Assert.isTrue(StringUtils.hasText(this.groupLogin), "Invalid resource path");
    Assert.isTrue(StringUtils.hasText(this.bookSlug), "Invalid resource path");
    Assert.isTrue(StringUtils.hasText(this.id), "Invalid resource path");
  }

  /**
   * judge yuQue token
   * @param yuQueToken User/Team token
   */
  private void judgeToken(String yuQueToken) {
    URI uri = URI.create(BASE_URL + INFO_PATH);

    HttpRequest httpRequest = HttpRequest.newBuilder().header("X-Auth-Token", yuQueToken).uri(uri).GET().build();

    try {
      HttpResponse<String> response = this.httpClient.send(httpRequest, HttpResponse.BodyHandlers.ofString());
      int statusCode = response.statusCode();
      Assert.isTrue(statusCode == 200, "Failed to auth YuQueToken");
    }
    catch (Exception e) {
      throw new RuntimeException(e);
    }
  }

  //......
}  

YuQueResource的构造器要求输入yuQueToken和resourcePath,它通过httpClient请求
https://www.yuque.com/api/v2/repos/{groupLogin}/{bookSlug}/docs/{id}
,之后解析body_html到inputStream;其中groupLogin、bookSlug、id是judgePathRule通过解析resourcePath提取出来

示例

community/document-readers/spring-ai-alibaba-starter-document-reader-yuque/src/test/java/com/alibaba/cloud/ai/reader/yuque/YuQueDocumentLoaderIT.java

@EnabledIfEnvironmentVariable(named = "YUQUE_TOKEN", matches = ".+")
@EnabledIfEnvironmentVariable(named = "YUQUE_RESOURCE_PATH", matches = ".+")
class YuQueDocumentLoaderIT {

  private static final String YU_QUE_TOKEN = System.getenv("YUQUE_TOKEN");

  private static final String RESOURCE_PATH = System.getenv("YUQUE_RESOURCE_PATH");

  YuQueDocumentReader reader;

  YuQueResource source;

  static {
    if (YU_QUE_TOKEN == null || RESOURCE_PATH == null) {
      System.out
        .println("YUQUE_TOKEN or YUQUE_RESOURCE_PATH environment variable is not set. Tests will be skipped.");
    }
  }

  @BeforeEach
  public void beforeEach() {
    // Skip test if environment variables are not set
    Assumptions.assumeTrue(YU_QUE_TOKEN != null && !YU_QUE_TOKEN.isEmpty(),
        "Skipping test because YUQUE_TOKEN is not set");
    Assumptions.assumeTrue(RESOURCE_PATH != null && !RESOURCE_PATH.isEmpty(),
        "Skipping test because YUQUE_RESOURCE_PATH is not set");

    source = YuQueResource.builder().yuQueToken(YU_QUE_TOKEN).resourcePath(RESOURCE_PATH).build();
    reader = new YuQueDocumentReader(source, new TikaDocumentParser());
  }

  @Test
  public void should_load_file() {
    // Skip test if reader is not initialized
    Assumptions.assumeTrue(reader != null, "Skipping test because reader is not initialized");

    List<Document> document = reader.get();
    String content = document.get(0).getText();

    System.out.println(content);
  }

}

小结


spring-ai-alibaba-starter-document-reader-yuque提供了YuQueDocumentReader,它通过YuQueResource去请求资源,再通过DocumentParser解析(
比如TikaDocumentParser)为Document,最后追加一个SOURCE的metadata。

doc

  • java2ai

相关推荐

《保卫萝卜2》安卓版大更新 壕礼助阵世界杯

《保卫萝卜2:极地冒险》本周不仅迎来了安卓版本的重大更新,同时将于7月4日本周五,带来“保卫萝卜2”安卓版本世界杯主题活动的火热开启,游戏更新与活动两不误。一定有玩家会问,激萌塔防到底进行了哪些更新?...

儿童手工折纸:胡萝卜,和孩子一起边玩边学carrot

1、准备两张正方形纸,一橙一绿,对折出折痕。2、橙色沿其中一条对角线如图折两三角形。3、把上面三角折平,如图。4、绿色纸折成三角形。5、再折成更小的三角形。6、再折三分之一如图。7、打开折纸,压平中间...

《饥荒》食物代码有哪些(饥荒最新版代码总汇食物篇)

饥荒游戏中,玩家们需要获取各种素材与食物,进行生存。玩家们在游戏中,进入游戏后按“~”键调出控制台使用代码,可以直接获得素材。比如胡萝卜的代码是carrot,玉米的代码是corn,南瓜的代码是pump...

Skyscanner:帮你找到最便宜机票 订票不求人

你喜欢旅行吗?在合适的时间、合适的目的地,来一场说走就走的旅行?机票就是关键!Skyscanner这款免费的手机应用,在几秒钟内比较全球600多家航空公司的航班安排、价格和时刻表,帮你节省金钱和时间。...

小猪佩奇第二季50(小猪佩奇第二季英文版免费观看)

Sleepover过夜Itisnighttime.现在是晚上。...

我在民政局工作的那些事儿(二)(我在民政局上班)

时间到了1997年的秋天,经过一年多的学习和实践,我在处理结婚和离婚的事情更加的娴熟,也获得了领导的器重,所以我在处理平时的工作时也能得心应手。这一天我正在离婚处和同事闲聊,因为离婚处几天也遇不到人,...

夏天来了就你还没瘦?教你不节食13天瘦10斤的哥本哈根减肥法……

好看的人都关注江苏气象啦夏天很快就要来了你是否和苏苏一样身上的肉肉还没做好准备?真是一个悲伤的故事……下面这个哥本哈根减肥法苏苏的同事亲测有效不节食不运动不反弹大家快来一起试试看吧~DAY1...

Pursuing global modernization for peaceful development, mutually beneficial cooperation, prosperity for all

AlocalworkeroperatesequipmentintheChina-EgyptTEDASuezEconomicandTradeCooperationZonei...

Centuries-old tea road regains glory as Belt and Road cooperation deepens

FUZHOU/ST.PETERSBURG,Oct.2(Xinhua)--NestledinthepicturesqueWuyiMountainsinsoutheastChi...

15 THE NUTCRACKERS OF NUTCRACKER LODGE (CONTINUED)胡桃夹子小屋里的胡桃夹子(续篇)

...

AI模型部署:Triton Inference Server模型部署框架简介和快速实践

关键词:...

Ftrace function graph简介(flat function)

引言由于android开发的需要与systrace的普及,现在大家在进行性能与功耗分析时候,经常会用到systrace跟pefetto.而systrace就是基于内核的eventtracing来实...

JAVA历史版本(java各版本)

JAVA发展1.1996年1月23日JDK1.0Java虚拟机SunClassicVM,Applet,AWT2.1997年2月19日JDK1.1JAR文件格式,JDBC,JavaBea...

java 进化史1(java的进阶之路)

java从1996年1月第一个版本诞生,到2022年3月最新的java18,已经经历了27年,整整18个大的版本。很久之前有人就说java要被淘汰,但是java活到现在依然坚挺,不知道java还能活...

学习java第二天(java学完后能做什么)

#java知识#...

取消回复欢迎 发表评论: