Jeremy Kahn

Prior Labs融資900萬歐元,用于構建突破性AI模型,可以處理表格和電子數據表中的數據。


圖片來源:Photo courtesy of Prior Labs




弗蘭克·哈特和諾亞·霍爾曼是兩位來自德國的計算機科學家,他們幫助開創了這種技術,并最近在著名的科學期刊《自然》(Nature)上發表了一篇論文。他們選擇與有金融從業經驗的蘇拉吉·甘比爾合作,創辦了一家名為Prior Labs的初創公司,致力于將該技術商業化。

近期,總部位于德國弗萊堡的Prior Labs宣布已獲得900萬歐元(930萬美元)種子前融資。這輪融資由總部位于倫敦的風險投資公司Balderton Capital領投,參投方包括XTX Ventures、SAP創始人漢斯·沃納-赫克托的赫克托基金(Hector Foundation)、Atlantic Labs和Galion.exe。Hugging Face聯合創始人兼首席科學家托馬斯·沃爾夫、Snyk和Tessl的創始人蓋伊·伯德扎尼,以及著名的DeepMind研究員艾德·格里芬斯泰特等知名天使投資人也參與了此次融資。

Balderton Capital合伙人詹姆斯·懷斯在解釋為什么決定投資Prior Labs的一份聲明中表示:“表格數據是科學和商業的支柱,但顛覆了文本、圖像和視頻領域的AI革命對表格數據的影響微乎其微——直到現在。”

Prior Labs在《自然》雜志上發表的研究報告中使用的模型被稱為Tabular Prior-Fitted Network(簡稱 TabPFN)。但 TabPFN的訓練僅使用了表格中的數值數據,而不是文本數據。Prior Labs公司的AI研究員弗蘭克·哈特曾任職于弗萊堡大學(University of Freiburg)和圖賓根埃利斯研究所(Ellis Institute Tubingen)。他表示,Prior Labs希望將這個模型變成多模態,使它既能理解數字,也能理解文本。然后該模型將能夠理解列標題并進行推理,用戶也可以像使用基于大語言模型的聊天機器人一樣,用自然語言提示與AI系統互動。

目前的大語言模型,即使是如OpenAI 的o3等更先進的推理模型,雖然可以回答一些關于表格內容的問題,但它們無法根據對表格數據的分析做出準確預測。哈特表示:“大語言模型在這方面表現得非常糟糕。它們在這方面的效果遠不及預期,且分析速度緩慢。”結果,大多數需要分析這類數據的人都使用了舊的統計方法,這些方法速度快,但并不總是最準確的。

但Prior Labs的TabPFN能夠做出精準預測,包括處理所謂的"時間序列"數據——這類預測基于復雜模式,利用歷史數據推斷下一個最可能的數據點。根據Prior Labs團隊1月發布在非同行評審研究平臺arxiv.org上的新論文顯示,TabPFN在時間序列預測方面的表現優于現有模型:較同類最佳小型AI模型預測準確率提升7.7%,甚至超越比其大65倍的模型3%。


Prior Labs以開源形式發布TabPFN模型,唯一許可要求是使用者必須公開聲明模型來源。哈特稱,該模型下載量已達約百萬次。與多數開源AI公司類似,Prior Labs計劃的盈利模式聚焦于針對客戶的用例定制模型,并為特定市場開發工具和應用。

Prior Labs并不是唯一致力于突破AI在表格數據方面限制的公司。由麻省理工學院(MIT)數據科學家德瓦弗拉特·沙阿創立的Ikigai Labs和法國初創公司Neuralk AI等正嘗試將深度學習(包括生成式AI)應用于表格數據,谷歌(Google)和微軟(Microsoft)的研究團隊也在攻克這一難題。谷歌云的表格數據解決方案部分基于AutoML技術(該技術使用機器學習,將創建有效AI模型所需的步驟自動化,哈特曾是該領域的先驅)。







A lot of information inside companies is what’s known as “tabular data,” or data that is presented in rows and columns. Think spreadsheets and database entries and lots of figures in reports.

Well, it turns out that artificial intelligence models have difficulty working with tabular data, for several reasons. It’s often a confusing jumble—sometimes text and sometimes numbers, as well as numbers in different units of measurement. What’s more, the relationship between different cells in the table is sometimes unclear. Knowing which cells influence which other cells in a table often requires domain expertise.

For years, machine learning researchers have been trying to crack this tabular data problem. Now, a group of researchers has found what they claim is an elegant solution: A large foundation model—similar to the large language models that underpin products like OpenAI’s ChatGPT—but specifically trained on tabular data. This pre-trained model can then be applied to any tabular data set, and with just a few examples, make accurate inferences about the relationship between data in various cells and also predict missing data better than any prior machine learning method.

Frank Hutter and Noah Hollman, two Germany-based computer scientists who helped pioneer this technique and recently published a paper on it in the prestigious scientific journal Nature, have teamed with Sauraj Gambhir, who has experience in finance, on a startup called Prior Labs dedicated to commercializing this technology.

Today Prior Labs, which is based in Freiburg, Germany, announced it has received 9 million euros ($9.3 million) in pre-seed funding. The round is led by London-based venture capital firm Balderton Capital along with XTX Ventures, SAP founder Hans Werner-Hector’s Hector Foundation, Atlantic Labs, and Galion.exe. A number of prominent angel investors, including Hugging Face cofounder and chief scientist Thomas Wolf, Guy Podjarny, who founded Snyk and Tessl, and Ed Grefenstette, a well-known DeepMind researcher, also participated in the funding.

“Tabular data is the backbone of science and business, yet the AI revolution transforming text, images and video has had only a marginal impact on tabular data–until now,” James Wise, a partner at Balderton Capital, said in a statement, explaining why the firm decided to invest in Prior Labs.

The model Prior Labs used for its Nature study is called a Tabular Prior-Fitted Network (TabPFN for short.) But TabPFN is trained only on the numerical data in tables, not the text. Hutter, a well-known AI researcher formerly at the University of Freiburg and the Ellis Institute Tubingen, said Prior Labs wants to take this model and make it multimodal, so that it can understand both numbers and text. Then the model will be able to understand column headings and reason about them, and users will be able to interact with the AI system using natural language prompts, just like an LLM-based chatbot.

Today’s LLM’s, even the more advanced reasoning models, such as OpenAI’s o3 model, can answer some questions about what a table says, but they can’t make accurate predictions based on an analysis of the data in the table. “LLMs are just horrible at that,” Hutter said. “It’s like, it’s nowhere close. It’s not only that, it’s also super slow.” As a result, most people who needed to analyze this kind of data used older statistical methods that were fast, but not always the most accurate.

But Prior Labs’ TabPFN can make accurate predictions, including on what are called time series, where past data is used to predict the next most likely data point based on complex patterns. In a new paper the Prior Labs team published in January on the non-peer reviewed research repository arxiv.org, the team found that TabPFN outperformed existing time series prediction models. It beat the best previous small AI model for such predictions by 7.7% and beat a model that is 65 times larger than TabPFN by 3%.

Time series prediction has many applications across industries, but especially in medical and financial domains. “Hedge funds love us,” Hutter said. (One of Prior Labs’ initial customers is, in fact, a hedge fund, but Hutter said he was contractually barred from saying which one. Another initial customer with which Hutter is doing a proof of concept is software giant SAP.)

Prior Labs is offering TabPFN as an open source model—with the only license requirement being that if people use the model, they must publicly say so. So far, it has been downloaded about one million times, according to Hutter. Like most open source AI companies, Prior Labs plans to make money by working with specific customers to help them tailor the models to their use case and also by building tools and applications for specific market segments.

Prior Labs is not the only company working to crack AI’s limits when it comes to tabular data. Startups Ikigai Labs, which was founded by MIT data scientist Devarat Shah, and French startup Neuralk AI are among others working on applying deep learning methods, including generative AI, to tabular data. Researchers at Google and Microsoft have also been working on this problem. Google Cloud’s tabular data solutions are built in part on AutoML, a process that uses machine learning to automate the steps needed to create effective AI models, an area that Hutter helped pioneer.

Hutter said Prior intends to keep improving its models, working more on relational databases, time series, and building the ability to do what is called “causal discovery”—where a user asks which data points in a table have a causal relationship with other data in the table. Then there’s the chat feature that will let users ask questions of the tables using a chat-like interface. “All of this we will build in the first year,” he said.



