Image content question and answer method based on multi-modality low-rank dual-linear pooling

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image content, bilinear technology, applied in the field of deep neural network, can solve the problem of high computational complexity

Active Publication Date: 2017-12-15

HANGZHOU DIANZI UNIV

View PDF5 Cites 38 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In addition, the feature fusion model based on the bilinear model has played a very good role in many fields, such as fine-grained image classification, natural language processing, and recommendation systems. here comes a big challenge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0102] The detailed parameters of the present invention will be further specifically described below.

[0103] Such as figure 1 As shown, the present invention provides a deep neural network structure for image content question answering (Image Question Answer, IQA), and the specific steps are as follows:

[0104] The data preprocessing described in step (1) and image and text are carried out feature extraction, specifically as follows:

[0105] The COCO-VQA dataset is used here as training and testing data.

[0106] 1-1. For image data, the existing 152-layer deep residual network (Resnet-152) model is used to extract image features. Specifically, we uniformly scale the image data to 448×448 and input it into the deep residual network, and extract the output of its res5c layer as the image feature

[0107] 1-2. For question text data, we first segment the question and build a word dictionary for the question. And each question only takes the first 15 words, and if the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an image content question and answer method based on multi-modality low-rank dual-linear pooling. The image content question and answer method comprises the following steps of 1, conducting data preprocessing on an answer text of a question described with an image and a natural language; 2, conducting characteristic fusing on a multi-modality low-rank dual-linear pooling model; 3, establishing a neural network structure based on an MFB pooling model and a collaborative concern model; 4, training the models and utilizing a backward propagation algorithm to train neural network parameters. The neural network model aiming at an image question & answer is provided, and especially a method for conducting unified modeling on cross-media data in a question-answer in the field of the image question and answer and a network structure for studying the 'collaborative concern' to conduct modeling description on an image and question fine-grained characteristics are provided, and the best effect currently in the field of the image question and answer is obtained.

Description

technical field [0001] The present invention relates to a deep neural network for Image Question Answer (IQA), in particular to a method for uniformly modeling image-question cross-media data and learning " Collaborative concern" for modeling expression. Background technique [0002] "Cross-media" unified expression is a cross direction between the research fields of computer vision and natural language processing, which aims to bridge the "semantic gap" between different media (such as images and texts) and establish a unified semantic expression. Based on the theory and method of cross-media unified expression, some current hot research directions have been derived, such as natural description generation (ImageCaptioning), image-text cross-media retrieval (Image-Text Cross-media Retrieval) and automatic question answering of image content (Image Questioning) Answering, IQA) and so on. The goal of image natural description generation is to give an image a summary of its c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F17/27G06N3/08

CPCG06F16/583G06F40/289G06N3/084

Inventor 俞俊余宙项晨钞

Owner HANGZHOU DIANZI UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Image content question and answer method based on multi-modality low-rank dual-linear pooling

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. An image content, bilinear technology, applied in the field of deep neural network, can solve the problem of high computational complexity

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An image content, bilinear technology, applied in the field of deep neural network, can solve the problem of high computational complexity

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology