国产高清吹潮免费视频,老熟女@tubeumtv,粉嫩av一区二区三区免费观看,亚洲国产成人精品青青草原

二維碼
企資網(wǎng)

掃一掃關(guān)注

當(dāng)前位置: 首頁 » 企資快訊 » 科技達(dá)人 » 正文

詳解可以嗎中動(dòng)作識(shí)別模型與代碼實(shí)踐

放大字體  縮小字體 發(fā)布日期:2022-12-24 15:19:51    作者:葉保富    瀏覽次數(shù):29
導(dǎo)讀

感謝分享自華為云社區(qū)《視頻動(dòng)作識(shí)別-云社區(qū)-華為云》,感謝作者分享:HWCloudAI。實(shí)驗(yàn)?zāi)繕?biāo)通過本案例得學(xué)習(xí):掌握C3D模型訓(xùn)練和模型推理、I3D模型推理得方法;注意事項(xiàng)本案例推薦使用TensorFlow-1.13.1,需使用GPU

感謝分享自華為云社區(qū)《視頻動(dòng)作識(shí)別-云社區(qū)-華為云》,感謝作者分享:HWCloudAI。

實(shí)驗(yàn)?zāi)繕?biāo)

通過本案例得學(xué)習(xí):

  1. 掌握C3D模型訓(xùn)練和模型推理、I3D模型推理得方法;
注意事項(xiàng)
  1. 本案例推薦使用TensorFlow-1.13.1,需使用
  2. GPU
  3. 運(yùn)行,請(qǐng)查看《ModelArts JupyterLab 硬件規(guī)格使用指南》了解切換硬件規(guī)格得方法;
  4. 如果您是第壹次使用 JupyterLab,請(qǐng)查看《ModelArts JupyterLab使用指導(dǎo)》了解使用方法;
  5. 如果您在使用 JupyterLab 過程中碰到報(bào)錯(cuò),請(qǐng)參考《ModelArts JupyterLab常見問題解決辦法》嘗試解決問題。
實(shí)驗(yàn)步驟案例內(nèi)容介紹

視頻動(dòng)作識(shí)別是指對(duì)一小段視頻中得內(nèi)容進(jìn)行分析,判斷視頻中得人物做了哪種動(dòng)作。視頻動(dòng)作識(shí)別與圖像領(lǐng)域得圖像識(shí)別,既有聯(lián)系又有區(qū)別,圖像識(shí)別是對(duì)一張靜態(tài)支持進(jìn)行識(shí)別,而視頻動(dòng)作識(shí)別不僅要考察每張支持得靜態(tài)內(nèi)容,還要考察不同支持靜態(tài)內(nèi)容之間得時(shí)空關(guān)系。比如一個(gè)人扶著一扇半開得門,僅憑這一張支持無法判斷該動(dòng)作是開門動(dòng)作還是關(guān)門動(dòng)作。

視頻分析領(lǐng)域得研究相比較圖像分析領(lǐng)域得研究,發(fā)展時(shí)間更短,也更有難度。視頻分析模型完成得難點(diǎn)首先在于,需要強(qiáng)大得計(jì)算資源來完成視頻得分析。視頻要拆解成為圖像進(jìn)行分析,導(dǎo)致模型得數(shù)據(jù)量十分龐大。視頻內(nèi)容有很重要得考慮因素是動(dòng)作得時(shí)間順序,需要將視頻轉(zhuǎn)換成得圖像通過時(shí)間關(guān)系聯(lián)系起來,做出判斷,所以模型需要考慮時(shí)序因素,加入時(shí)間維度之后參數(shù)也會(huì)大量增加。

得益于PASCAL VOC、ImageNet、MS COCO等數(shù)據(jù)集得公開,圖像領(lǐng)域產(chǎn)生了很多得經(jīng)典模型,那么在視頻分析領(lǐng)域有沒有什么經(jīng)典得模型呢?答案是有得,本案例將為大家介紹視頻動(dòng)作識(shí)別領(lǐng)域得經(jīng)典模型并進(jìn)行代碼實(shí)踐。

1.準(zhǔn)備源代碼和數(shù)據(jù)

這一步準(zhǔn)備案例所需得源代碼和數(shù)據(jù),相關(guān)資源已經(jīng)保存在OBS中,我們通過ModelArts SDK將資源下載到本地,并解壓到當(dāng)前目錄下。解壓后,當(dāng)前目錄包含data、dataset_subset和其他目錄文件,分別是預(yù)訓(xùn)練參數(shù)文件、數(shù)據(jù)集和代碼文件等。

import osimport moxing as moxif not os.path.exists('videos'): mox.file.copy("obs://ai-course-common-26-bj4-v2/video/video.tar.gz", "./video.tar.gz") # 使用tar命令解壓資源包 os.system("tar xf ./video.tar.gz") # 使用rm命令刪除壓縮包 os.system("rm ./video.tar.gz")

INFO:root:Using MoXing-v1.17.3-INFO:root:Using OBS-Python-SDK-3.20.7

上一節(jié)課我們已經(jīng)介紹了視頻動(dòng)作識(shí)別有HMDB51、UCF-101和Kinetics三個(gè)常用得數(shù)據(jù)集,本案例選用了UCF-101數(shù)據(jù)集得部分子集作為演示用數(shù)據(jù)集,接下來,我們播放一段UCF-101中得視頻:

video_name = "./data/v_TaiChi_g01_c01.avi"

from IPython.display import clear_output, Image, display, HTMLimport timeimport cv2import base64import numpy as npdef arrayShow(img): _,ret = cv2.imencode('.jpg', img) return Image(data=ret) cap = cv2.VideoCapture(video_name)while True: try: clear_output(wait=True) ret, frame = cap.read() if ret: tmp = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) img = arrayShow(frame) display(img) time.sleep(0.05) else: break except KeyboardInterrupt: cap.release()cap.release()2.視頻動(dòng)作識(shí)別模型介紹

在圖像領(lǐng)域中,ImageNet作為一個(gè)大型圖像識(shí)別數(shù)據(jù)集,自2010年開始,使用此數(shù)據(jù)集訓(xùn)練出得圖像算法層出不窮,深度學(xué)習(xí)模型經(jīng)歷了從AlexNet到VGG-16再到更加復(fù)雜得結(jié)構(gòu),模型得表現(xiàn)也越來越好。在識(shí)別千種類別得支持時(shí),錯(cuò)誤率表現(xiàn)如下:

在圖像識(shí)別中表現(xiàn)很好得模型,可以在圖像領(lǐng)域得其他任務(wù)中繼續(xù)使用,通過復(fù)用模型中部分層得參數(shù),就可以提升模型得訓(xùn)練效果。有了基于ImageNet模型得圖像模型,很多模型和任務(wù)都有了更好得訓(xùn)練基礎(chǔ),比如說物體檢測、實(shí)例分割、人臉檢測、人臉識(shí)別等。

那么訓(xùn)練效果顯著得圖像模型是否可以用于視頻模型得訓(xùn)練呢?答案是yes,有研究證明,在視頻領(lǐng)域,如果能夠復(fù)用圖像模型結(jié)構(gòu),甚至參數(shù),將對(duì)視頻模型得訓(xùn)練有很大幫助。但是怎樣才能復(fù)用上圖像模型得結(jié)構(gòu)呢?首先需要知道視頻分類與圖像分類得不同,如果將視頻視作是圖像得集合,每一個(gè)幀將作為一個(gè)圖像,視頻分類任務(wù)除了要考慮到圖像中得表現(xiàn),也要考慮圖像間得時(shí)空關(guān)系,才可以對(duì)視頻動(dòng)作進(jìn)行分類。

為了捕獲圖像間得時(shí)空關(guān)系,論文I3D介紹了三種舊得視頻分類模型,并提出了一種更有效得Two-Stream Inflated 3D ConvNets(簡稱I3D)得模型,下面將逐一簡介這四種模型,更多細(xì)節(jié)信息請(qǐng)查看原論文。

舊模型一:卷積網(wǎng)絡(luò)+LSTM

模型使用了訓(xùn)練成熟得圖像模型,通過卷積網(wǎng)絡(luò),對(duì)每一幀圖像進(jìn)行特征提取、池化和預(yù)測,最后在模型得末端加一個(gè)LSTM層(長短期記憶網(wǎng)絡(luò)),如下圖所示,這樣就可以使模型能夠考慮時(shí)間性結(jié)構(gòu),將上下文特征聯(lián)系起來,做出動(dòng)作判斷。這種模型得缺點(diǎn)是只能捕獲較大得工作,對(duì)小動(dòng)作得識(shí)別效果較差,而且由于視頻中得每一幀圖像都要經(jīng)過網(wǎng)絡(luò)得計(jì)算,所以訓(xùn)練時(shí)間很長。

舊模型二:3D卷積網(wǎng)絡(luò)

3D卷積類似于2D卷積,將時(shí)序信息加入卷積操作。雖然這是一種看起來更加自然得視頻處理方式,但是由于卷積核維度增加,參數(shù)得數(shù)量也增加了,模型得訓(xùn)練變得更加困難。這種模型沒有對(duì)圖像模型進(jìn)行復(fù)用,而是直接將視頻數(shù)據(jù)傳入3D卷積網(wǎng)絡(luò)進(jìn)行訓(xùn)練。

舊模型三:Two-Stream 網(wǎng)絡(luò)

Two-Stream 網(wǎng)絡(luò)得兩個(gè)流分別為1張RGB快照和10張計(jì)算之后得光流幀畫面組成得棧。兩個(gè)流都通過ImageNet預(yù)訓(xùn)練好得圖像卷積網(wǎng)絡(luò),光流部分可以分為豎直和水平兩個(gè)通道,所以是普通支持輸入得2倍,模型在訓(xùn)練和測試中表現(xiàn)都十分出色。

光流視頻 optical flow video

上面講到了光流,在此對(duì)光流做一下介紹。光流是什么呢?名字很可以,感覺很陌生,但實(shí)際上這種視覺現(xiàn)象我們每天都在經(jīng)歷,我們坐高鐵得時(shí)候,可以看到窗外得景物都在快速往后退,開得越快,就感受到外面得景物就是“刷”地一個(gè)殘影,這種視覺上目標(biāo)得運(yùn)動(dòng)方向和速度就是光流。光流從概念上講,是對(duì)物體運(yùn)動(dòng)得觀察,通過找到相鄰幀之間得相關(guān)性來判斷幀之間得對(duì)應(yīng)關(guān)系,計(jì)算出相鄰幀畫面中物體得運(yùn)動(dòng)信息,獲取像素運(yùn)動(dòng)得瞬時(shí)速度。在原始視頻中,有運(yùn)動(dòng)部分和靜止得背景部分,我們通常需要判斷得只是視頻中運(yùn)動(dòng)部分得狀態(tài),而光流就是通過計(jì)算得到了視頻中運(yùn)動(dòng)部分得運(yùn)動(dòng)信息。

下面是一個(gè)經(jīng)過計(jì)算后得原視頻及光流視頻。

原視頻

光流視頻

新模型:Two-Stream Inflated 3D ConvNets

新模型采取了以下幾點(diǎn)結(jié)構(gòu)改進(jìn):

  • 拓展2D卷積為3D。直接利用成熟得圖像分類模型,只不過將網(wǎng)絡(luò)中二維$ N × N得 filters 和 pooling kernels 直接變成得filters和poolingkernels直接變成 N × N × N $;
  • 用 2D filter 得預(yù)訓(xùn)練參數(shù)來初始化 3D filter 得參數(shù)。上一步已經(jīng)利用了圖像分類模型得網(wǎng)絡(luò),這一步得目得是能利用上網(wǎng)絡(luò)得預(yù)訓(xùn)練參數(shù),直接將 2D filter 得參數(shù)直接沿著第三個(gè)時(shí)間維度進(jìn)行復(fù)制N次,最后將所有參數(shù)值再除以N;
  • 調(diào)整感受野得形狀和大小。新模型改造了圖像分類模型Inception-v1得結(jié)構(gòu),前兩個(gè)max-pooling層改成使用$ 1 × 3 × 3kernels and stride 1 in time,其他所有max-pooling層都仍然使用對(duì)此得kernel和stride,最后一個(gè)average pooling層使用kernelsandstride1intime,其他所有max?pooling層都仍然使用對(duì)此得kernel和stride,最后一個(gè)averagepooling層使用 2 × 7 × 7 $得kernel。
  • 延續(xù)了Two-Stream得基本方法。用雙流結(jié)構(gòu)來捕獲支持之間得時(shí)空關(guān)系仍然是有效得。

    最后新模型得整體結(jié)構(gòu)如下圖所示:

    好,到目前為止,我們已經(jīng)講解了視頻動(dòng)作識(shí)別得經(jīng)典數(shù)據(jù)集和經(jīng)典模型,下面我們通過代碼來實(shí)踐地跑一跑其中得兩個(gè)模型:C3D模型( 3D卷積網(wǎng)絡(luò))以及I3D模型(Two-Stream Inflated 3D ConvNets)。

    C3D模型結(jié)構(gòu)

    我們已經(jīng)在前面得“舊模型二:3D卷積網(wǎng)絡(luò)”中講解到3D卷積網(wǎng)絡(luò)是一種看起來比較自然得處理視頻得網(wǎng)絡(luò),雖然它有效果不夠好,計(jì)算量也大得特點(diǎn),但它得結(jié)構(gòu)很簡單,可以構(gòu)造一個(gè)很簡單得網(wǎng)絡(luò)就可以實(shí)現(xiàn)視頻動(dòng)作識(shí)別,如下圖所示是3D卷積得示意圖:

    a)中,一張支持進(jìn)行了2D卷積, b)中,對(duì)視頻進(jìn)行2D卷積,將多個(gè)幀視作多個(gè)通道, c)中,對(duì)視頻進(jìn)行3D卷積,將時(shí)序信息加入輸入信號(hào)中。

    ab中,output都是一張二維特征圖,所以無論是輸入是否有時(shí)間信息,輸出都是一張二維得特征圖,2D卷積失去了時(shí)序信息。只有3D卷積在輸出時(shí),保留了時(shí)序信息。2D和3D池化操作同樣有這樣得問題。

    如下圖所示是一種C3D網(wǎng)絡(luò)得變種:(如需閱讀原文描述,請(qǐng)查看I3D論文 2.2 節(jié))

    C3D結(jié)構(gòu),包括8個(gè)卷積層,5個(gè)蕞大池化層以及2個(gè)全連接層,最后是softmax輸出層。

    所有得3D卷積核為$ 3 × 3 × 3$ 步長為1,使用SGD,初始學(xué)習(xí)率為0.003,每150k個(gè)迭代,除以2。優(yōu)化在1.9M個(gè)迭代得時(shí)候結(jié)束,大約13epoch。

    數(shù)據(jù)處理時(shí),視頻抽幀定義大小為:$ c × l × h × w,c為通道數(shù)量,為通道數(shù)量,l為幀得數(shù)量,h為幀畫面得高度,w為幀畫面得寬度。3D卷積核和池化核得大小為 d × k × k,d是核得時(shí)間深度,k是核得空間大小。網(wǎng)絡(luò)得輸入為視頻得抽幀,預(yù)測出得是類別標(biāo)簽。所有得視頻幀畫面都調(diào)整大小為128 × 171 $,幾乎將UCF-101數(shù)據(jù)集中得幀調(diào)整為一半大小。視頻被分為不重復(fù)得16幀畫面,這些畫面將作為模型網(wǎng)絡(luò)得輸入。最后對(duì)幀畫面得大小進(jìn)行裁剪,輸入得數(shù)據(jù)為$16 × 112 × 112 $

    3.C3D模型訓(xùn)練

    接下來,我們將對(duì)C3D模型進(jìn)行訓(xùn)練,訓(xùn)練過程分為:數(shù)據(jù)預(yù)處理以及模型訓(xùn)練。在此次訓(xùn)練中,我們使用得數(shù)據(jù)集為UCF-101,由于C3D模型得輸入是視頻得每幀支持,因此我們需要對(duì)數(shù)據(jù)集得視頻進(jìn)行抽幀,也就是將視頻轉(zhuǎn)換為支持,然后將支持?jǐn)?shù)據(jù)傳入模型之中,進(jìn)行訓(xùn)練。

    在本案例中,我們隨機(jī)抽取了UCF-101數(shù)據(jù)集得一部分進(jìn)行訓(xùn)練得演示,感興趣得同學(xué)可以下載完整得UCF-101數(shù)據(jù)集進(jìn)行訓(xùn)練。

    UCF-101下載

    數(shù)據(jù)集存儲(chǔ)在目錄dataset_subset下

    如下代碼是使用cv2庫進(jìn)行視頻文件到支持文件得轉(zhuǎn)換

    import cv2import os# 視頻數(shù)據(jù)集存儲(chǔ)位置video_path = './dataset_subset/'# 生成得圖像數(shù)據(jù)集存儲(chǔ)位置save_path = './dataset/'# 如果文件路徑不存在則創(chuàng)建路徑if not os.path.exists(save_path): os.mkdir(save_path)

    # 獲取動(dòng)作列表action_list = os.listdir(video_path)# 遍歷所有動(dòng)作for action in action_list: if action.startswith(".")==False: if not os.path.exists(save_path+action): os.mkdir(save_path+action) video_list = os.listdir(video_path+action) # 遍歷所有視頻 for video in video_list: prefix = video.split('.')[0] if not os.path.exists(os.path.join(save_path, action, prefix)): os.mkdir(os.path.join(save_path, action, prefix)) save_name = os.path.join(save_path, action, prefix) + '/' video_name = video_path+action+'/'+video # 讀取視頻文件 # cap為視頻得幀 cap = cv2.VideoCapture(video_name) # fps為幀率 fps = int(cap.get(cv2.CAP_PROP_frame_COUNT)) fps_count = 0 for i in range(fps): ret, frame = cap.read() if ret: # 將幀畫面寫入支持文件中 cv2.imwrite(save_name+str(10000+fps_count)+'.jpg',frame) fps_count += 1

    此時(shí),視頻逐幀轉(zhuǎn)換成得支持?jǐn)?shù)據(jù)已經(jīng)存儲(chǔ)起來,為模型訓(xùn)練做準(zhǔn)備。

    4.模型訓(xùn)練

    首先,我們構(gòu)建模型結(jié)構(gòu)。

    C3D模型結(jié)構(gòu)我們之前已經(jīng)介紹過,這里我們通過keras提供得Conv3D,MaxPool3D,ZeroPadding3D等函數(shù)進(jìn)行模型得搭建。

    from keras.layers import Dense,Dropout,Conv3D,Input,MaxPool3D,Flatten,Activation, ZeroPadding3Dfrom keras.regularizers import l2from keras.models import Model, Sequential# 輸入數(shù)據(jù)為 112×112 得支持,16幀, 3通道input_shape = (112,112,16,3)# 權(quán)重衰減率weight_decay = 0.005# 類型數(shù)量,我們使用UCF-101 為數(shù)據(jù)集,所以為101nb_classes = 101# 構(gòu)建模型結(jié)構(gòu)inputs = Input(input_shape)x = Conv3D(64,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(inputs)x = MaxPool3D((2,2,1),strides=(2,2,1),padding='same')(x)x = Conv3D(128,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)x = Conv3D(128,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)x = Conv3D(256,(3,3,3),strides=(1,1,1),padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2,2,2),strides=(2,2,2),padding='same')(x)x = Conv3D(256, (3, 3, 3), strides=(1, 1, 1), padding='same', activation='relu',kernel_regularizer=l2(weight_decay))(x)x = MaxPool3D((2, 2, 2), strides=(2, 2, 2), padding='same')(x)x = Flatten()(x)x = Dense(2048,activation='relu',kernel_regularizer=l2(weight_decay))(x)x = Dropout(0.5)(x)x = Dense(2048,activation='relu',kernel_regularizer=l2(weight_decay))(x)x = Dropout(0.5)(x)x = Dense(nb_classes,kernel_regularizer=l2(weight_decay))(x)x = Activation('softmax')(x)model = Model(inputs, x)

    Using TensorFlow backend./home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)])/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)])WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.Instructions for updating:Colocations handled automatically by placer.WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.Instructions for updating:Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.

    通過keras提供得summary()方法,打印模型結(jié)構(gòu)??梢钥吹侥P偷脤訕?gòu)建以及各層得輸入輸出情況。

    model.summary()

    此處輸出較長,省略

    通過keras得input方法可以查看模型得輸入形狀,shape分別為( batch size, width, height, frames, channels) 。

    model.input

    <tf.Tensor 'input_1:0' shape=(?, 112, 112, 16, 3) dtype=float32>

    可以看到模型得數(shù)據(jù)處理得維度與圖像處理模型有一些差別,多了frames維度,體現(xiàn)出時(shí)序關(guān)系在視頻分析中得影響。

    接下來,我們開始將支持文件轉(zhuǎn)為訓(xùn)練需要得數(shù)據(jù)形式。

    # 引用必要得庫from keras.optimizers import SGD,Adamfrom keras.utils import np_utilsimport numpy as npimport randomimport cv2import matplotlib.pyplot as plt# 自定義callbacksfrom schedules import onetenth_4_8_12

    INFO:matplotlib.font_manager:font search path ['/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/ttf', '/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/afm', '/home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/matplotlib/mpl-data/fonts/pdfcorefonts']INFO:matplotlib.font_manager:generated new fontManager

    參數(shù)定義

    img_path = save_path # 支持文件存儲(chǔ)位置results_path = './results' # 訓(xùn)練結(jié)果保存位置if not os.path.exists(results_path): os.mkdir(results_path)

    數(shù)據(jù)集劃分,隨機(jī)抽取4/5 作為訓(xùn)練集,其余為驗(yàn)證集。將文件信息分別存儲(chǔ)在train_list和test_list中,為訓(xùn)練做準(zhǔn)備。

    cates = os.listdir(img_path)train_list = []test_list = []# 遍歷所有得動(dòng)作類型for cate in cates: videos = os.listdir(os.path.join(img_path, cate)) length = len(videos)//5 # 訓(xùn)練集大小,隨機(jī)取視頻文件加入訓(xùn)練集 train= random.sample(videos, length*4) train_list.extend(train) # 將余下得視頻加入測試集 for video in videos: if video not in train: test_list.append(video)print("訓(xùn)練集為:") print( train_list)print("共%d 個(gè)視頻\n"%(len(train_list)))print("驗(yàn)證集為:") print(test_list)print("共%d 個(gè)視頻"%(len(test_list)))

    此處輸出較長,省略

    接下來開始進(jìn)行模型得訓(xùn)練。

    首先定義數(shù)據(jù)讀取方法。方法process_data中讀取一個(gè)batch得數(shù)據(jù),包含16幀得支持信息得數(shù)據(jù),以及數(shù)據(jù)得標(biāo)注信息。在讀取支持?jǐn)?shù)據(jù)時(shí),對(duì)支持進(jìn)行隨機(jī)裁剪和翻轉(zhuǎn)操作以完成數(shù)據(jù)增廣。

    def process_data(img_path, file_list,batch_size=16,train=True): batch = np.zeros((batch_size,16,112,112,3),dtype='float32') labels = np.zeros(batch_size,dtype='int') cate_list = os.listdir(img_path) def read_classes(): path = "./classInd.txt" with open(path, "r+") as f: lines = f.readlines() classes = {} for line in lines: c_id = line.split()[0] c_name = line.split()[1] classes[c_name] =c_id return classes classes_dict = read_classes() for file in file_list: cate = file.split("_")[1] img_list = os.listdir(os.path.join(img_path, cate, file)) img_list.sort() batch_img = [] for i in range(batch_size): path = os.path.join(img_path, cate, file) label = int(classes_dict[cate])-1 symbol = len(img_list)//16 if train: # 隨機(jī)進(jìn)行裁剪 crop_x = random.randint(0, 15) crop_y = random.randint(0, 58) # 隨機(jī)進(jìn)行翻轉(zhuǎn) is_flip = random.randint(0, 1) # 以16 幀為單位 for j in range(16): img = img_list[symbol + j] image = cv2.imread( path + '/' + img) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (171, 128)) if is_flip == 1: image = cv2.flip(image, 1) batch[i][j][:][:][:] = image[crop_x:crop_x + 112, crop_y:crop_y + 112, :] symbol-=1 if symbol<0: break labels[i] = label else: for j in range(16): img = img_list[symbol + j] image = cv2.imread( path + '/' + img) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (171, 128)) batch[i][j][:][:][:] = image[8:120, 30:142, :] symbol-=1 if symbol<0: break labels[i] = label return batch, labels

    batch, labels = process_data(img_path, train_list)print("每個(gè)batch得形狀為:%s"%(str(batch.shape)))print("每個(gè)label得形狀為:%s"%(str(labels.shape)))

    每個(gè)batch得形狀為:(16, 16, 112, 112, 3)每個(gè)label得形狀為:(16,)

    定義data generator, 將數(shù)據(jù)批次傳入訓(xùn)練函數(shù)中。

    def generator_train_batch(train_list, batch_size, num_classes, img_path): while True: # 讀取一個(gè)batch得數(shù)據(jù) x_train, x_labels = process_data(img_path, train_list, batch_size=16,train=True) x = preprocess(x_train) # 形成input要求得數(shù)據(jù)格式 y = np_utils.to_categorical(np.array(x_labels), num_classes) x = np.transpose(x, (0,2,3,1,4)) yield x, ydef generator_val_batch(test_list, batch_size, num_classes, img_path): while True: # 讀取一個(gè)batch得數(shù)據(jù) y_test,y_labels = process_data(img_path, train_list, batch_size=16,train=False) x = preprocess(y_test) # 形成input要求得數(shù)據(jù)格式 x = np.transpose(x,(0,2,3,1,4)) y = np_utils.to_categorical(np.array(y_labels), num_classes) yield x, y

    定義方法preprocess, 對(duì)函數(shù)得輸入數(shù)據(jù)進(jìn)行圖像得標(biāo)準(zhǔn)化處理。

    def preprocess(inputs): inputs[..., 0] -= 99.9 inputs[..., 1] -= 92.1 inputs[..., 2] -= 82.6 inputs[..., 0] /= 65.8 inputs[..., 1] /= 62.3 inputs[..., 2] /= 60.3 return inputs

    # 訓(xùn)練一個(gè)epoch大約需4分鐘# 類別數(shù)量num_classes = 101# batch大小batch_size = 4# epoch數(shù)量epochs = 1# 學(xué)習(xí)率大小lr = 0.005# 優(yōu)化器定義sgd = SGD(lr=lr, momentum=0.9, nesterov=True)model感謝原創(chuàng)分享者pile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])# 開始訓(xùn)練history = model.fit_generator(generator_train_batch(train_list, batch_size, num_classes,img_path), steps_per_epoch= len(train_list) // batch_size, epochs=epochs, callbacks=[onetenth_4_8_12(lr)], validation_data=generator_val_batch(test_list, batch_size,num_classes,img_path), validation_steps= len(test_list) // batch_size, verbose=1)# 對(duì)訓(xùn)練結(jié)果進(jìn)行保存model.save_weights(os.path.join(results_path, 'weights_c3d.h5'))

    WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.Instructions for updating:Use tf.cast instead.Epoch 1/120/20 [==============================] - 442s 22s/step - loss: 28.7099 - acc: 0.9344 - val_loss: 27.7600 - val_acc: 1.00005.模型測試

    接下來我們將訓(xùn)練之后得到得模型進(jìn)行測試。隨機(jī)在UCF-101中選擇一個(gè)視頻文件作為測試數(shù)據(jù),然后對(duì)視頻進(jìn)行取幀,每16幀畫面?zhèn)魅肽P瓦M(jìn)行一次動(dòng)作預(yù)測,并且將動(dòng)作預(yù)測以及預(yù)測百分比打印在畫面中并進(jìn)行視頻播放。

    首先,引入相關(guān)得庫。

    from IPython.display import clear_output, Image, display, HTMLimport timeimport cv2import base64import numpy as np

    構(gòu)建模型結(jié)構(gòu)并且加載權(quán)重。

    from models import c3d_modelmodel = c3d_model()model.load_weights(os.path.join(results_path, 'weights_c3d.h5'), by_name=True) # 加載剛訓(xùn)練得模型

    定義函數(shù)arrayshow,進(jìn)行支持變量得編碼格式轉(zhuǎn)換。

    def arrayShow(img): _,ret = cv2.imencode('.jpg', img) return Image(data=ret)

    進(jìn)行視頻得預(yù)處理以及預(yù)測,將預(yù)測結(jié)果打印到畫面中,最后進(jìn)行播放。

    # 加載所有得類別和編號(hào)with open('./ucfTrainTestlist/classInd.txt', 'r') as f: class_names = f.readlines() f.close()# 讀取視頻文件video = './videos/v_Punch_g03_c01.avi'cap = cv2.VideoCapture(video)clip = []# 將視頻畫面?zhèn)魅肽P蛍hile True: try: clear_output(wait=True) ret, frame = cap.read() if ret: tmp = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) clip.append(cv2.resize(tmp, (171, 128))) # 每16幀進(jìn)行一次預(yù)測 if len(clip) == 16: inputs = np.array(clip).astype(np.float32) inputs = np.expand_dims(inputs, axis=0) inputs[..., 0] -= 99.9 inputs[..., 1] -= 92.1 inputs[..., 2] -= 82.6 inputs[..., 0] /= 65.8 inputs[..., 1] /= 62.3 inputs[..., 2] /= 60.3 inputs = inputs[:,:,8:120,30:142,:] inputs = np.transpose(inputs, (0, 2, 3, 1, 4)) # 獲得預(yù)測結(jié)果 pred = model.predict(inputs) label = np.argmax(pred[0]) # 將預(yù)測結(jié)果繪制到畫面中 cv2.putText(frame, class_names[label].split(' ')[-1].strip(), (20, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 1) cv2.putText(frame, "prob: %.4f" % pred[0][label], (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 1) clip.pop(0) # 播放預(yù)測后得視頻 lines, columns, _ = frame.shape frame = cv2.resize(frame, (int(columns), int(lines))) img = arrayShow(frame) display(img) time.sleep(0.02) else: break except: print(0)cap.release()6.I3D 模型

    在之前我們簡單介紹了I3D模型,I3D自家github庫提供了在Kinetics上預(yù)訓(xùn)練得模型和預(yù)測代碼,接下來我們將體驗(yàn)I3D模型如何對(duì)視頻進(jìn)行預(yù)測。

    首先,引入相關(guān)得包

    import numpy as npimport tensorflow as tfimport i3d

    WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.For more information, please see: * 感謝分享github感謝原創(chuàng)分享者/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md * 感謝分享github感謝原創(chuàng)分享者/tensorflow/addonsIf you depend on functionality not listed there, please file an issue.

    進(jìn)行參數(shù)得定義

    # 輸入支持大小_IMAGE_SIZE = 224# 視頻得幀數(shù)_SAMPLE_V發(fā)布者會(huì)員賬號(hào)EO_frameS = 79# 輸入數(shù)據(jù)包括兩部分:RGB和光流# RGB和光流數(shù)據(jù)已經(jīng)經(jīng)過提前計(jì)算_SAMPLE_PATHS = { 'rgb': 'data/v_CricketShot_g04_c01_rgb.npy', 'flow': 'data/v_CricketShot_g04_c01_flow.npy',}# 提供了多種可以選擇得預(yù)訓(xùn)練權(quán)重# 其中,imagenet系列模型從ImageNet得2D權(quán)重中拓展而來,其余為視頻數(shù)據(jù)下得預(yù)訓(xùn)練權(quán)重_CHECKPOINT_PATHS = { 'rgb': 'data/checkpoints/rgb_scratch/model.ckpt', 'flow': 'data/checkpoints/flow_scratch/model.ckpt', 'rgb_imagenet': 'data/checkpoints/rgb_imagenet/model.ckpt', 'flow_imagenet': 'data/checkpoints/flow_imagenet/model.ckpt',}# 記錄類別文件_LABEL_MAP_PATH = 'data/label_map.txt'# 類別數(shù)量為400NUM_CLASSES = 400

    定義參數(shù):

  • imagenet_pretrained :如果為True,則調(diào)用預(yù)訓(xùn)練權(quán)重,如果為False,則調(diào)用ImageNet轉(zhuǎn)成得權(quán)重

    imagenet_pretrained = True

    # 加載動(dòng)作類型kinetics_classes = [x.strip() for x in open(_LABEL_MAP_PATH)]tf.logging.set_verbosity(tf.logging.INFO)

    構(gòu)建RGB部分模型

    rgb_input = tf.placeholder(tf.float32, shape=(1, _SAMPLE_V發(fā)布者會(huì)員賬號(hào)EO_frameS, _IMAGE_SIZE, _IMAGE_SIZE, 3))with tf.variable_scope('RGB', reuse=tf.AUTO_REUSE): rgb_model = i3d.InceptionI3d(NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits') rgb_logits, _ = rgb_model(rgb_input, is_training=False, dropout_keep_prob=1.0)rgb_variable_map = {}for variable in tf.global_variables(): if variable.name.split('/')[0] == 'RGB': rgb_variable_map[variable.name.replace(':0', '')] = variable rgb_saver = tf.train.Saver(var_list=rgb_variable_map, reshape=True)

    構(gòu)建光流部分模型

    flow_input = tf.placeholder(tf.float32,shape=(1, _SAMPLE_V發(fā)布者會(huì)員賬號(hào)EO_frameS, _IMAGE_SIZE, _IMAGE_SIZE, 2))with tf.variable_scope('Flow', reuse=tf.AUTO_REUSE): flow_model = i3d.InceptionI3d(NUM_CLASSES, spatial_squeeze=True, final_endpoint='Logits') flow_logits, _ = flow_model(flow_input, is_training=False, dropout_keep_prob=1.0)flow_variable_map = {}for variable in tf.global_variables(): if variable.name.split('/')[0] == 'Flow': flow_variable_map[variable.name.replace(':0', '')] = variableflow_saver = tf.train.Saver(var_list=flow_variable_map, reshape=True)

    將模型聯(lián)合,成為完整得I3D模型

    model_logits = rgb_logits + flow_logitsmodel_predictions = tf.nn.softmax(model_logits)

    開始模型預(yù)測,獲得視頻動(dòng)作預(yù)測結(jié)果。
    預(yù)測數(shù)據(jù)為開篇提供得RGB和光流數(shù)據(jù):

    with tf.Session() as sess: feed_dict = {} if imagenet_pretrained: rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb_imagenet']) # 加載rgb流得模型 else: rgb_saver.restore(sess, _CHECKPOINT_PATHS['rgb']) tf.logging.info('RGB checkpoint restored') if imagenet_pretrained: flow_saver.restore(sess, _CHECKPOINT_PATHS['flow_imagenet']) # 加載flow流得模型 else: flow_saver.restore(sess, _CHECKPOINT_PATHS['flow']) tf.logging.info('Flow checkpoint restored') start_time = time.time() rgb_sample = np.load(_SAMPLE_PATHS['rgb']) # 加載rgb流得輸入數(shù)據(jù) tf.logging.info('RGB data loaded, shape=%s', str(rgb_sample.shape)) feed_dict[rgb_input] = rgb_sample flow_sample = np.load(_SAMPLE_PATHS['flow']) # 加載flow流得輸入數(shù)據(jù) tf.logging.info('Flow data loaded, shape=%s', str(flow_sample.shape)) feed_dict[flow_input] = flow_sample out_logits, out_predictions = sess.run( [model_logits, model_predictions], feed_dict=feed_dict) out_logits = out_logits[0] out_predictions = out_predictions[0] sorted_indices = np.argsort(out_predictions)[::-1] print('Inference time in sec: %.3f' % float(time.time() - start_time)) print('Norm of logits: %f' % np.linalg.norm(out_logits)) print('\nTop classes and probabilities') for index in sorted_indices[:20]: print(out_predictions[index], out_logits[index], kinetics_classes[index])

    WARNING:tensorflow:From /home/ma-user/anaconda3/envs/TensorFlow-1.13.1/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.Instructions for updating:Use standard file APIs to check for files with this prefix.INFO:tensorflow:Restoring parameters from data/checkpoints/rgb_imagenet/model.ckptINFO:tensorflow:RGB checkpoint restoredINFO:tensorflow:Restoring parameters from data/checkpoints/flow_imagenet/model.ckptINFO:tensorflow:Flow checkpoint restoredINFO:tensorflow:RGB data loaded, shape=(1, 79, 224, 224, 3)INFO:tensorflow:Flow data loaded, shape=(1, 79, 224, 224, 2)Inference time in sec: 1.511Norm of logits: 138.468643Top classes and probabilities1.0 41.813675 playing cricket1.497162e-09 21.49398 hurling (sport)3.8431236e-10 20.13411 catching or throwing baseball1.549242e-10 19.22559 catching or throwing softball1.1360187e-10 18.915354 hitting baseball8.801105e-11 18.660116 playing tennis2.4415466e-11 17.37787 playing kickball1.153184e-11 16.627766 playing squash or racquetball6.1318893e-12 15.996157 shooting goal (soccer)4.391727e-12 15.662376 hammer throw2.2134352e-12 14.9772005 golf putting1.6307096e-12 14.67167 throwing discus1.5456218e-12 14.618079 javelin throw7.6690325e-13 13.917259 pumping fist5.1929587e-13 13.527372 shot put4.2681337e-13 13.331245 celebrating2.7205462e-13 12.880901 applauding1.8357015e-13 12.487494 throwing ball1.6134511e-13 12.358444 dodgeball1.1388395e-13 12.010078 tap dancing

    感謝閱讀下方,第壹時(shí)間了解華為云新鮮技術(shù)~

    華為云博客_大數(shù)據(jù)博客_AI博客_云計(jì)算博客_開發(fā)者中心-華為云

  •  
    (文/葉保富)
    打賞
    免責(zé)聲明
    本文為葉保富推薦作品?作者: 葉保富。歡迎轉(zhuǎn)載,轉(zhuǎn)載請(qǐng)注明原文出處:http://biorelated.com/qzkx/show-101868.html 。本文僅代表作者個(gè)人觀點(diǎn),本站未對(duì)其內(nèi)容進(jìn)行核實(shí),請(qǐng)讀者僅做參考,如若文中涉及有違公德、觸犯法律的內(nèi)容,一經(jīng)發(fā)現(xiàn),立即刪除,作者需自行承擔(dān)相應(yīng)責(zé)任。涉及到版權(quán)或其他問題,請(qǐng)及時(shí)聯(lián)系我們郵件:weilaitui@qq.com。
     

    Copyright ? 2016 - 2023 - 企資網(wǎng) 48903.COM All Rights Reserved 粵公網(wǎng)安備 44030702000589號(hào)

    粵ICP備16078936號(hào)

    微信

    關(guān)注
    微信

    微信二維碼

    WAP二維碼

    客服

    聯(lián)系
    客服

    聯(lián)系客服:

    在線QQ: 303377504

    客服電話: 020-82301567

    E_mail郵箱: weilaitui@qq.com

    微信公眾號(hào): weishitui

    客服001 客服002 客服003

    工作時(shí)間:

    周一至周五: 09:00 - 18:00

    反饋

    用戶
    反饋